正则表达式(3)

主题

正则表达式(3)

贪婪量词/惰性量词/支配量词

贪婪量词

先看整个字符串是不是一个匹配。如果没有发现匹配,它去掉最后字符串中的最后一个字符,并再次尝试。如果还是没有发现匹配,那么再次去掉最后一个字符串,这个过程会一直重复直到发现一个匹配或者字符串不剩任何字符。简单量词都是贪婪量词。

惰性量词

先看字符串中的第一个字母是不是一个匹配,如果单独着一个字符还不够,就读入下一个字符,组成两个字符的字符串。如果还没有发现匹配,惰性量词继续从字符串中添加字符直到发现一个匹配或者整个字符串都检查过也没有匹配。惰性量词和贪婪量词的工作方式恰好相反。

支配量词

只尝试匹配整个字符串。如果整个字符串不能产生匹配,不做进一步尝试。

测试代码

@State(Scope.Benchmark)
@BenchmarkMode({Mode.SingleShotTime, Mode.SampleTime})
@Fork(value = 3)
@Warmup(iterations = 1, time = 1)
@Measurement(iterations = 3, time = 1)
public class RegexQuantifiersBenchmark {

private static final String REGEX1 = "^(a+)+$";
private static final String REGEX2 = "^(a+?)+?$";
private static final String REGEX3 = "^(a++)++$";
private static final Pattern PATTERN1 = Pattern.compile(REGEX1);
private static final Pattern PATTERN2 = Pattern.compile(REGEX2);
private static final Pattern PATTERN3 = Pattern.compile(REGEX3);

@Setup
public void setup() {
}

@Benchmark
public void match1(Blackhole blackhole, Sample param) {
blackhole.consume(PATTERN1.matcher(param.getValue()));
}

@Benchmark
public void match2(Blackhole blackhole, Sample param) {
blackhole.consume(PATTERN2.matcher(param.getValue()));
}

@Benchmark
public void match3(Blackhole blackhole, Sample param) {
blackhole.consume(PATTERN3.matcher(param.getValue()));
}

@State(Scope.Benchmark)
public static class Sample {
@Param({"aa", "aaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"aX", "aaaaaaaaaaaaaaaaX", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaN"})
public String value;

public String getValue() {
return value;
}
}
}

结果

Benchmark                                 (value)   Mode  Cnt         Score         Error  Units
match1 aa thrpt 24 26749975.073 ± 942304.636 ops/s
match1 aaaaaaaaaaaaaaaaa thrpt 24 27417593.120 ± 777442.759 ops/s
match1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa thrpt 24 27189627.225 ± 1223664.876 ops/s
match1 aX thrpt 24 27207228.425 ± 1025730.715 ops/s
match1 aaaaaaaaaaaaaaaaX thrpt 24 25853408.437 ± 1288133.167 ops/s
match1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaN thrpt 24 26640237.606 ± 1010726.633 ops/s
match2 aa thrpt 24 21587967.638 ± 3335574.562 ops/s
match2 aaaaaaaaaaaaaaaaa thrpt 24 27639638.597 ± 1003855.880 ops/s
match2 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa thrpt 24 27096814.756 ± 947027.055 ops/s
match2 aX thrpt 24 26445982.632 ± 955279.512 ops/s
match2 aaaaaaaaaaaaaaaaX thrpt 24 27414496.066 ± 1019957.012 ops/s
match2 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaN thrpt 24 26966894.966 ± 932043.104 ops/s
match3 aa thrpt 24 27286586.758 ± 1658506.207 ops/s
match3 aaaaaaaaaaaaaaaaa thrpt 24 22938414.685 ± 3345509.487 ops/s
match3 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa thrpt 24 28401273.954 ± 975501.353 ops/s
match3 aX thrpt 24 27451149.289 ± 832562.926 ops/s
match3 aaaaaaaaaaaaaaaaX thrpt 24 28423829.416 ± 1434026.126 ops/s
match3 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaN thrpt 24 28124904.082 ± 1404057.195 ops/s

Process finished with exit code 0