Groovy Regular Expressions - The Benchmark (Part 2)
In the second part of the "Groovy Regular Expression" blog post, I want to show you some benchmarks. And let me make one thing clear - the following results you are going to see are not scientific proof. I present those results only to give you a hint about the overall performance of some cool features you have seen before.
Introduction
You will see that, in general, all presented approaches, and their equivalents perform quite similar, so there is no need to refactor your code after reading this section. If you never saw JMH benchmark tests, you are going to learn something new in a few minutes. Let’s go!
Here you can find the Gradle project containing all benchmark tests. Below are steps to execute those tests on your machine.
|
Here are the specs of the laptop I run benchmark tests on Lenovo ThinkPad T440p laptop with Intel® Core™ i7-4900MQ CPU @ 2.80GHz and 16 GBs RAM. |
I run those benchmarks with four different Groovy and JDK variants:
Groovy 2.5.11-indy and OpenJDK 1.8.0_242 (64-Bit Server VM, 25.242-b08)
Groovy 2.5.11-indy and OpenJDK 11.0.6 (64-Bit Server VM, 11.0.6+10)
Groovy 3.0.3-indy and OpenJDK 1.8.0_242 (64-Bit Server VM, 25.242-b08)
Groovy 3.0.3-indy and OpenJDK 11.0.6 (64-Bit Server VM, 11.0.6+10)
Every benchmark test was executed using both dynamic and static compilation.
All measurements are expressed in μs/op. Remember, that 1 μs is equal to 0.001 ms (millisecond) and 0.000001 s (second). |
Benchmark 1: multiple assignments from the regular expression
In this benchmark, I wanted to check if using multiple variable assignments comes with a significant cost compared to a traditional group extraction. Here is what the test class looks like.
package bench
import groovy.transform.CompileStatic
import org.openjdk.jmh.annotations.Benchmark
import org.openjdk.jmh.annotations.Scope
import org.openjdk.jmh.annotations.State
import java.util.regex.Matcher
import java.util.regex.Pattern
@State(Scope.Benchmark)
class A1_Multiple_Assignment_Bench {
private static final Random random = new Random()
private static final List<String> data = [
'Some item name 1: $99.99 (-15%)',
'Some item name 2: $49.99 (-5%)',
'Some item name 3: $19.99',
'Some item name 4: $29.99 (-13%)',
'Some item name 5: $9.99',
'Some item name 6: $51.21 (-2%)',
'Some item name 7: $4.32 (-1%)',
'Some item name 8: $3.14 (-23%)',
'Some item name 9: $9.00'
]
private static final Pattern pattern = ~/\$(\d{1,4}\.\d{2})\s?\(?(-\d+%)?\)?/
private static final String getInput() {
return data.get(random.nextInt(data.size()))
}
@Benchmark
String multiple_assignment_dynamic() {
def (_,price,discount) = (input =~ pattern)[0]
return "price: $price, discount: $discount"
}
@Benchmark
@CompileStatic
String multiple_assignment_static() {
final List list = (input =~ pattern)[0] as ArrayList
def (_, price, discount) = [list[0], list[1], list[2]]
return "price: $price, discount: $discount"
}
@Benchmark
String standard_matcher_dynamic() {
final Matcher matcher = pattern.matcher(input)
if (!matcher.find()) {
throw new IllegalStateException("Pattern didn't match!")
}
def price = matcher.group(1)
def discount = matcher.group(2)
return "price: $price, discount: $discount"
}
@Benchmark
@CompileStatic
String standard_matcher_static() {
final Matcher matcher = pattern.matcher(input)
if (!matcher.find()) {
throw new IllegalStateException("Pattern didn't match!")
}
def price = matcher.group(1)
def discount = matcher.group(2)
return "price: $price, discount: $discount"
}
}
Here are the results for specific Groovy and JDK versions.
Groovy 2.5.11-indy and OpenJDK 1.8.0_242
Benchmark Mode Cnt Score Error Units
A1_Multiple_Assignment_Bench.multiple_assignment_dynamic avgt 60 0,808 ± 0,013 us/op
A1_Multiple_Assignment_Bench.multiple_assignment_static avgt 60 0,688 ± 0,009 us/op
A1_Multiple_Assignment_Bench.standard_matcher_dynamic avgt 60 0,409 ± 0,007 us/op
A1_Multiple_Assignment_Bench.standard_matcher_static avgt 60 0,387 ± 0,006 us/op
Groovy 2.5.11-indy and OpenJDK 11.0.6
Benchmark Mode Cnt Score Error Units
A1_Multiple_Assignment_Bench.multiple_assignment_dynamic avgt 60 1,087 ± 0,022 us/op
A1_Multiple_Assignment_Bench.multiple_assignment_static avgt 60 0,967 ± 0,023 us/op
A1_Multiple_Assignment_Bench.standard_matcher_dynamic avgt 60 0,545 ± 0,012 us/op
A1_Multiple_Assignment_Bench.standard_matcher_static avgt 60 0,538 ± 0,013 us/op
Groovy 3.0.3-indy and OpenJDK 1.8.0_242
Benchmark Mode Cnt Score Error Units
A1_Multiple_Assignment_Bench.multiple_assignment_dynamic avgt 60 0,720 ± 0,012 us/op
A1_Multiple_Assignment_Bench.multiple_assignment_static avgt 60 0,750 ± 0,018 us/op
A1_Multiple_Assignment_Bench.standard_matcher_dynamic avgt 60 0,427 ± 0,010 us/op
A1_Multiple_Assignment_Bench.standard_matcher_static avgt 60 0,371 ± 0,006 us/op
Groovy 3.0.3-indy and OpenJDK 11.0.6
Benchmark Mode Cnt Score Error Units
A1_Multiple_Assignment_Bench.multiple_assignment_dynamic avgt 60 0,958 ± 0,018 us/op
A1_Multiple_Assignment_Bench.multiple_assignment_static avgt 60 0,989 ± 0,018 us/op
A1_Multiple_Assignment_Bench.standard_matcher_dynamic avgt 60 0,522 ± 0,013 us/op
A1_Multiple_Assignment_Bench.standard_matcher_static avgt 60 0,514 ± 0,008 us/op
And here what those results look like on a single chart.
Thoughts?
We can see that OpenJDK 11.0.6 is just a bit "slower" compared to OpenJDK 1.8.0_242.
In most cases, Groovy 3.0.3 performed a bit better compared to other variants.
It still does not prove anything. If we compare the fastest multiple assignments in a dynamic compilation variant (
0.720 μs
) with the slowest one (1.087 μs
), we will find that it was around0.367 μs
slower. Or0.000367 ms
to show it to you on a scale that is much easier to imagine. Can it be a bottleneck in your application? Absolutely not.
Benchmark 2: exact match operator
In this benchmark, we compare the performance of the exact match operator with the traditional matcher.matches()
. Here is the test class.
package bench
import groovy.transform.CompileStatic
import groovy.transform.TypeChecked
import org.openjdk.jmh.annotations.Benchmark
import org.openjdk.jmh.annotations.Scope
import org.openjdk.jmh.annotations.State
import java.util.regex.Matcher
import java.util.regex.Pattern
@State(Scope.Benchmark)
class A2_Exact_Match_Operator_Bench {
private static final Random random = new Random()
private static final List<String> data = [
"1605-FACD-0000-EXIT",
"1606-FACD-0000-EXIT",
"1607-FACD-0000-EXIT",
"1608-FACD-0000-EXIT",
"1609-FACD-0000-EXIT",
"1610-FACD-0000-EXIT",
"1611-FACD-0000-EXIT",
"1611-FACD-0001-EXIT",
"1611-FACD-0002-EXIT",
"1611-FACD-0003-EXIT",
"1612-FACD-0000-EXIT"
]
private static final Pattern pattern = ~/^\d{4}-[A-Z]{4}-0000-EXIT$/
private static final String getInput() {
return data.get(random.nextInt(data.size()))
}
@Benchmark
boolean match_operator_dynamic() {
return input ==~ pattern
}
@Benchmark
@CompileStatic
boolean match_operator_static() {
return input ==~ pattern
}
@Benchmark
boolean matcher_matches_dynamic() {
final Matcher matcher = pattern.matcher(input)
return matcher.matches()
}
@Benchmark
@CompileStatic
boolean matcher_matches_static() {
final Matcher matcher = pattern.matcher(input)
return matcher.matches()
}
}
And here are the results.
Groovy 2.5.11-indy and OpenJDK 1.8.0_242
Benchmark Mode Cnt Score Error Units
A2_Exact_Match_Operator_Bench.match_operator_dynamic avgt 60 0,243 ± 0,003 us/op
A2_Exact_Match_Operator_Bench.match_operator_static avgt 60 0,220 ± 0,005 us/op
A2_Exact_Match_Operator_Bench.matcher_matches_dynamic avgt 60 0,162 ± 0,003 us/op
A2_Exact_Match_Operator_Bench.matcher_matches_static avgt 60 0,147 ± 0,003 us/op
Groovy 2.5.11-indy and OpenJDK 11.0.6
Benchmark Mode Cnt Score Error Units
A2_Exact_Match_Operator_Bench.match_operator_dynamic avgt 60 0,262 ± 0,004 us/op
A2_Exact_Match_Operator_Bench.match_operator_static avgt 60 0,285 ± 0,004 us/op
A2_Exact_Match_Operator_Bench.matcher_matches_dynamic avgt 60 0,243 ± 0,004 us/op
A2_Exact_Match_Operator_Bench.matcher_matches_static avgt 60 0,162 ± 0,004 us/op
Groovy 3.0.3-indy and OpenJDK 1.8.0_242
Benchmark Mode Cnt Score Error Units
A2_Exact_Match_Operator_Bench.match_operator_dynamic avgt 60 0,237 ± 0,004 us/op
A2_Exact_Match_Operator_Bench.match_operator_static avgt 60 0,227 ± 0,003 us/op
A2_Exact_Match_Operator_Bench.matcher_matches_dynamic avgt 60 0,176 ± 0,004 us/op
A2_Exact_Match_Operator_Bench.matcher_matches_static avgt 60 0,145 ± 0,003 us/op
Groovy 3.0.3-indy and OpenJDK 11.0.6
Benchmark Mode Cnt Score Error Units
A2_Exact_Match_Operator_Bench.match_operator_dynamic avgt 60 0,310 ± 0,008 us/op
A2_Exact_Match_Operator_Bench.match_operator_static avgt 60 0,280 ± 0,001 us/op
A2_Exact_Match_Operator_Bench.matcher_matches_dynamic avgt 60 0,229 ± 0,005 us/op
A2_Exact_Match_Operator_Bench.matcher_matches_static avgt 60 0,212 ± 0,004 us/op
And here is the chart.
Thoughts?
No significant differences. It looks like OpenJDK 11.0.6 generally did a bit slower, but the difference is not significant.
Benchmark 3: replaceFirst
with regular expression
And here is the final benchmark. This time we will check how String.replaceFirst()
with pattern and closure performs compared to a conventional approach. We can expect to see some differences between both alternatives. The replaceFirst()
variant has to generate a closure in memory, so it comes with some tiny overhead. Let’s see if it is significant or not.
package bench
import groovy.transform.CompileStatic
import org.openjdk.jmh.annotations.Benchmark
import org.openjdk.jmh.annotations.Scope
import org.openjdk.jmh.annotations.State
import java.util.regex.Matcher
import java.util.regex.Pattern
@State(Scope.Benchmark)
class A3_Regexp_Replace_Bench {
private static final Random random = new Random()
private static final List<String> data = [
"v3.4.23",
"v3.4.24",
"v3.4.25",
"v3.4.26",
"v3.5.0",
"v3.5.1",
"v3.5.2",
"v3.5.3",
"v3.5.4",
"v3.5.5",
"v4.0.0",
"v4.0.1",
"v4.0.2",
"v4.0.3",
"v4.0.4",
"v4.1.0",
"v4.1.1",
"v4.1.2",
"v4.1.3"
]
private static final Pattern pattern = ~/^v(\d{1,3})\.(\d{1,3})\.\d{1,4}$/
private static String getVersion() {
return data.get(random.nextInt(data.size()))
}
@Benchmark
String string_replace_first_dynamic() {
return version.replaceFirst(pattern) { _,major,minor -> "v${major}.${(minor as int) + 1}.0"}
}
@Benchmark
@CompileStatic
String string_replace_first_static() {
return version.replaceFirst(pattern) { _,major,minor -> "v${major}.${(minor as int) + 1}.0"}
}
@Benchmark
String matcher_matches_use_case_dynamic() {
final Matcher matcher = pattern.matcher(version)
if (!matcher.matches()) {
throw new IllegalStateException("Pattern didn't match!")
}
def major = matcher.group(1)
def minor = matcher.group(2)
return "v${major}.${(minor as int) + 1}.0".toString()
}
@Benchmark
@CompileStatic
String matcher_matches_use_case_static() {
final Matcher matcher = pattern.matcher(version)
if (!matcher.matches()) {
throw new IllegalStateException("Pattern didn't match!")
}
def major = matcher.group(1)
def minor = matcher.group(2)
return "v${major}.${(minor as int) + 1}.0".toString()
}
}
Here are the results.
Groovy 2.5.11-indy and OpenJDK 1.8.0_242
Benchmark Mode Cnt Score Error Units
A3_Regexp_Replace_Bench.matcher_matches_use_case_dynamic avgt 60 0,570 ± 0,009 us/op
A3_Regexp_Replace_Bench.matcher_matches_use_case_static avgt 60 0,488 ± 0,008 us/op
A3_Regexp_Replace_Bench.string_replace_first_dynamic avgt 60 0,873 ± 0,014 us/op
A3_Regexp_Replace_Bench.string_replace_first_static avgt 60 0,832 ± 0,013 us/op
Groovy 2.5.11-indy and OpenJDK 11.0.6
Benchmark Mode Cnt Score Error Units
A3_Regexp_Replace_Bench.matcher_matches_use_case_dynamic avgt 60 0,579 ± 0,015 us/op
A3_Regexp_Replace_Bench.matcher_matches_use_case_static avgt 60 0,543 ± 0,002 us/op
A3_Regexp_Replace_Bench.string_replace_first_dynamic avgt 60 1,034 ± 0,034 us/op
A3_Regexp_Replace_Bench.string_replace_first_static avgt 60 0,970 ± 0,023 us/op
Groovy 3.0.3-indy and OpenJDK 1.8.0_242
Benchmark Mode Cnt Score Error Units
A3_Regexp_Replace_Bench.matcher_matches_use_case_dynamic avgt 60 0,574 ± 0,011 us/op
A3_Regexp_Replace_Bench.matcher_matches_use_case_static avgt 60 0,516 ± 0,008 us/op
A3_Regexp_Replace_Bench.string_replace_first_dynamic avgt 60 0,925 ± 0,016 us/op
A3_Regexp_Replace_Bench.string_replace_first_static avgt 60 0,878 ± 0,015 us/op
Groovy 3.0.3-indy and OpenJDK 11.0.6
Benchmark Mode Cnt Score Error Units
A3_Regexp_Replace_Bench.matcher_matches_use_case_dynamic avgt 60 0,609 ± 0,013 us/op
A3_Regexp_Replace_Bench.matcher_matches_use_case_static avgt 60 0,552 ± 0,013 us/op
A3_Regexp_Replace_Bench.string_replace_first_dynamic avgt 60 1,014 ± 0,028 us/op
A3_Regexp_Replace_Bench.string_replace_first_static avgt 60 0,945 ± 0,026 us/op
And here is the last chart.
Thoughts?
The variant with
String.replaceFirst()
performs "slower" as expected.Does it make a huge difference? I wouldn’t say so. In the worst-case scenario, using
String.replaceFirst()
with a closure like the one shown in the test will need0.001 ms
instead of0.0005 ms
to finish execution.
Conclusion
I hope those benchmarks have shown you that there are some small differences between the different variants. Still, they are not something you should be worried about. Measuring application performance is not trivial, and in most cases, real bottlenecks exist in entirely different areas. I can tell you from my experience that one of the first places worth checking when tweaking application’s performance is its I/O layer. I’ve seen countless times inefficient queries to the database, or threads blocked for nothing, and those are the real issues.
0 Comments