In the second part of the "Groovy Regular Expression" blog post, I want to show you some benchmarks. And let me make one thing clear - the following results you are going to see are not scientific proof. I present those results only to give you a hint about the overall performance of some cool features you have seen before.

Introduction

You will see that, in general, all presented approaches, and their equivalents perform quite similar, so there is no need to refactor your code after reading this section. If you never saw JMH benchmark tests, you are going to learn something new in a few minutes. Let’s go!

Here you can find the Gradle project containing all benchmark tests. Below are steps to execute those tests on your machine.

$ git clone https://github.com/wololock/groovy-regexp-examples
Cloning into 'groovy-regexp-examples'...
remote: Enumerating objects: 29, done.
remote: Total 29 (delta 0), reused 0 (delta 0), pack-reused 29
Unpacking objects: 100% (29/29), 10.64 KiB | 1.33 MiB/s, done.

$ cd groovy-regexp-examples

$ ./gradlew --no-daemon jmh
Here are the specs of the laptop I run benchmark tests on Lenovo ThinkPad T440p laptop with Intel® Core™ i7-4900MQ CPU @ 2.80GHz and 16 GBs RAM.

I run those benchmarks with four different Groovy and JDK variants:

  • Groovy 2.5.11-indy and OpenJDK 1.8.0_242 (64-Bit Server VM, 25.242-b08)

  • Groovy 2.5.11-indy and OpenJDK 11.0.6 (64-Bit Server VM, 11.0.6+10)

  • Groovy 3.0.3-indy and OpenJDK 1.8.0_242 (64-Bit Server VM, 25.242-b08)

  • Groovy 3.0.3-indy and OpenJDK 11.0.6 (64-Bit Server VM, 11.0.6+10)

Every benchmark test was executed using both dynamic and static compilation.

All measurements are expressed in μs/op. Remember, that 1 μs is equal to 0.001 ms (millisecond) and 0.000001 s (second).

Benchmark 1: multiple assignments from the regular expression

In this benchmark, I wanted to check if using multiple variable assignments comes with a significant cost compared to a traditional group extraction. Here is what the test class looks like.

package bench

import groovy.transform.CompileStatic
import org.openjdk.jmh.annotations.Benchmark
import org.openjdk.jmh.annotations.Scope
import org.openjdk.jmh.annotations.State

import java.util.regex.Matcher
import java.util.regex.Pattern

@State(Scope.Benchmark)
class A1_Multiple_Assignment_Bench {

    private static final Random random = new Random()

    private static final List<String> data = [
            'Some item name 1: $99.99 (-15%)',
            'Some item name 2: $49.99 (-5%)',
            'Some item name 3: $19.99',
            'Some item name 4: $29.99 (-13%)',
            'Some item name 5: $9.99',
            'Some item name 6: $51.21 (-2%)',
            'Some item name 7: $4.32 (-1%)',
            'Some item name 8: $3.14 (-23%)',
            'Some item name 9: $9.00'
    ]

    private static final Pattern pattern = ~/\$(\d{1,4}\.\d{2})\s?\(?(-\d+%)?\)?/

    private static final String getInput() {
        return data.get(random.nextInt(data.size()))
    }

    @Benchmark
    String multiple_assignment_dynamic() {
        def (_,price,discount) = (input =~ pattern)[0]

        return "price: $price, discount: $discount"
    }

    @Benchmark
    @CompileStatic
    String multiple_assignment_static() {
        final List list = (input =~ pattern)[0] as ArrayList

        def (_, price, discount) = [list[0], list[1], list[2]]

        return "price: $price, discount: $discount"
    }

    @Benchmark
    String standard_matcher_dynamic() {
        final Matcher matcher = pattern.matcher(input)
        if (!matcher.find()) {
            throw new IllegalStateException("Pattern didn't match!")
        }

        def price = matcher.group(1)
        def discount = matcher.group(2)

        return "price: $price, discount: $discount"
    }

    @Benchmark
    @CompileStatic
    String standard_matcher_static() {
        final Matcher matcher = pattern.matcher(input)
        if (!matcher.find()) {
            throw new IllegalStateException("Pattern didn't match!")
        }

        def price = matcher.group(1)
        def discount = matcher.group(2)

        return "price: $price, discount: $discount"
    }
}

Here are the results for specific Groovy and JDK versions.

Groovy 2.5.11-indy and OpenJDK 1.8.0_242

Benchmark                                                 Mode  Cnt  Score   Error  Units
A1_Multiple_Assignment_Bench.multiple_assignment_dynamic  avgt   60  0,808 ± 0,013  us/op
A1_Multiple_Assignment_Bench.multiple_assignment_static   avgt   60  0,688 ± 0,009  us/op
A1_Multiple_Assignment_Bench.standard_matcher_dynamic     avgt   60  0,409 ± 0,007  us/op
A1_Multiple_Assignment_Bench.standard_matcher_static      avgt   60  0,387 ± 0,006  us/op

Groovy 2.5.11-indy and OpenJDK 11.0.6

Benchmark                                                 Mode  Cnt  Score   Error  Units
A1_Multiple_Assignment_Bench.multiple_assignment_dynamic  avgt   60  1,087 ± 0,022  us/op
A1_Multiple_Assignment_Bench.multiple_assignment_static   avgt   60  0,967 ± 0,023  us/op
A1_Multiple_Assignment_Bench.standard_matcher_dynamic     avgt   60  0,545 ± 0,012  us/op
A1_Multiple_Assignment_Bench.standard_matcher_static      avgt   60  0,538 ± 0,013  us/op

Groovy 3.0.3-indy and OpenJDK 1.8.0_242

Benchmark                                                 Mode  Cnt  Score   Error  Units
A1_Multiple_Assignment_Bench.multiple_assignment_dynamic  avgt   60  0,720 ± 0,012  us/op
A1_Multiple_Assignment_Bench.multiple_assignment_static   avgt   60  0,750 ± 0,018  us/op
A1_Multiple_Assignment_Bench.standard_matcher_dynamic     avgt   60  0,427 ± 0,010  us/op
A1_Multiple_Assignment_Bench.standard_matcher_static      avgt   60  0,371 ± 0,006  us/op

Groovy 3.0.3-indy and OpenJDK 11.0.6

Benchmark                                                 Mode  Cnt  Score   Error  Units
A1_Multiple_Assignment_Bench.multiple_assignment_dynamic  avgt   60  0,958 ± 0,018  us/op
A1_Multiple_Assignment_Bench.multiple_assignment_static   avgt   60  0,989 ± 0,018  us/op
A1_Multiple_Assignment_Bench.standard_matcher_dynamic     avgt   60  0,522 ± 0,013  us/op
A1_Multiple_Assignment_Bench.standard_matcher_static      avgt   60  0,514 ± 0,008  us/op

And here what those results look like on a single chart.

Thoughts?

  • We can see that OpenJDK 11.0.6 is just a bit "slower" compared to OpenJDK 1.8.0_242.

  • In most cases, Groovy 3.0.3 performed a bit better compared to other variants.

  • It still does not prove anything. If we compare the fastest multiple assignments in a dynamic compilation variant (0.720 μs) with the slowest one (1.087 μs), we will find that it was around 0.367 μs slower. Or 0.000367 ms to show it to you on a scale that is much easier to imagine. Can it be a bottleneck in your application? Absolutely not.

Benchmark 2: exact match operator

In this benchmark, we compare the performance of the exact match operator with the traditional matcher.matches(). Here is the test class.

package bench

import groovy.transform.CompileStatic
import groovy.transform.TypeChecked
import org.openjdk.jmh.annotations.Benchmark
import org.openjdk.jmh.annotations.Scope
import org.openjdk.jmh.annotations.State

import java.util.regex.Matcher
import java.util.regex.Pattern

@State(Scope.Benchmark)
class A2_Exact_Match_Operator_Bench {

    private static final Random random = new Random()

    private static final List<String> data = [
            "1605-FACD-0000-EXIT",
            "1606-FACD-0000-EXIT",
            "1607-FACD-0000-EXIT",
            "1608-FACD-0000-EXIT",
            "1609-FACD-0000-EXIT",
            "1610-FACD-0000-EXIT",
            "1611-FACD-0000-EXIT",
            "1611-FACD-0001-EXIT",
            "1611-FACD-0002-EXIT",
            "1611-FACD-0003-EXIT",
            "1612-FACD-0000-EXIT"
    ]

    private static final Pattern pattern = ~/^\d{4}-[A-Z]{4}-0000-EXIT$/

    private static final String getInput() {
        return data.get(random.nextInt(data.size()))
    }

    @Benchmark
    boolean match_operator_dynamic() {
        return input ==~ pattern
    }

    @Benchmark
    @CompileStatic
    boolean match_operator_static() {
        return input ==~ pattern
    }

    @Benchmark
    boolean matcher_matches_dynamic() {
        final Matcher matcher = pattern.matcher(input)

        return matcher.matches()
    }

    @Benchmark
    @CompileStatic
    boolean matcher_matches_static() {
        final Matcher matcher = pattern.matcher(input)

        return matcher.matches()
    }
}

And here are the results.

Groovy 2.5.11-indy and OpenJDK 1.8.0_242

Benchmark                                                 Mode  Cnt  Score   Error  Units
A2_Exact_Match_Operator_Bench.match_operator_dynamic      avgt   60  0,243 ± 0,003  us/op
A2_Exact_Match_Operator_Bench.match_operator_static       avgt   60  0,220 ± 0,005  us/op
A2_Exact_Match_Operator_Bench.matcher_matches_dynamic     avgt   60  0,162 ± 0,003  us/op
A2_Exact_Match_Operator_Bench.matcher_matches_static      avgt   60  0,147 ± 0,003  us/op

Groovy 2.5.11-indy and OpenJDK 11.0.6

Benchmark                                                 Mode  Cnt  Score   Error  Units
A2_Exact_Match_Operator_Bench.match_operator_dynamic      avgt   60  0,262 ± 0,004  us/op
A2_Exact_Match_Operator_Bench.match_operator_static       avgt   60  0,285 ± 0,004  us/op
A2_Exact_Match_Operator_Bench.matcher_matches_dynamic     avgt   60  0,243 ± 0,004  us/op
A2_Exact_Match_Operator_Bench.matcher_matches_static      avgt   60  0,162 ± 0,004  us/op

Groovy 3.0.3-indy and OpenJDK 1.8.0_242

Benchmark                                                 Mode  Cnt  Score   Error  Units
A2_Exact_Match_Operator_Bench.match_operator_dynamic      avgt   60  0,237 ± 0,004  us/op
A2_Exact_Match_Operator_Bench.match_operator_static       avgt   60  0,227 ± 0,003  us/op
A2_Exact_Match_Operator_Bench.matcher_matches_dynamic     avgt   60  0,176 ± 0,004  us/op
A2_Exact_Match_Operator_Bench.matcher_matches_static      avgt   60  0,145 ± 0,003  us/op

Groovy 3.0.3-indy and OpenJDK 11.0.6

Benchmark                                                 Mode  Cnt  Score   Error  Units
A2_Exact_Match_Operator_Bench.match_operator_dynamic      avgt   60  0,310 ± 0,008  us/op
A2_Exact_Match_Operator_Bench.match_operator_static       avgt   60  0,280 ± 0,001  us/op
A2_Exact_Match_Operator_Bench.matcher_matches_dynamic     avgt   60  0,229 ± 0,005  us/op
A2_Exact_Match_Operator_Bench.matcher_matches_static      avgt   60  0,212 ± 0,004  us/op

And here is the chart.

Thoughts?

  • No significant differences. It looks like OpenJDK 11.0.6 generally did a bit slower, but the difference is not significant.

Benchmark 3: replaceFirst with regular expression

And here is the final benchmark. This time we will check how String.replaceFirst() with pattern and closure performs compared to a conventional approach. We can expect to see some differences between both alternatives. The replaceFirst() variant has to generate a closure in memory, so it comes with some tiny overhead. Let’s see if it is significant or not.

package bench

import groovy.transform.CompileStatic
import org.openjdk.jmh.annotations.Benchmark
import org.openjdk.jmh.annotations.Scope
import org.openjdk.jmh.annotations.State

import java.util.regex.Matcher
import java.util.regex.Pattern

@State(Scope.Benchmark)
class A3_Regexp_Replace_Bench {

    private static final Random random = new Random()

    private static final List<String> data = [
            "v3.4.23",
            "v3.4.24",
            "v3.4.25",
            "v3.4.26",
            "v3.5.0",
            "v3.5.1",
            "v3.5.2",
            "v3.5.3",
            "v3.5.4",
            "v3.5.5",
            "v4.0.0",
            "v4.0.1",
            "v4.0.2",
            "v4.0.3",
            "v4.0.4",
            "v4.1.0",
            "v4.1.1",
            "v4.1.2",
            "v4.1.3"
    ]

    private static final Pattern pattern = ~/^v(\d{1,3})\.(\d{1,3})\.\d{1,4}$/

    private static String getVersion() {
        return data.get(random.nextInt(data.size()))
    }

    @Benchmark
    String string_replace_first_dynamic() {
        return version.replaceFirst(pattern) { _,major,minor -> "v${major}.${(minor as int) + 1}.0"}
    }

    @Benchmark
    @CompileStatic
    String string_replace_first_static() {
        return version.replaceFirst(pattern) { _,major,minor -> "v${major}.${(minor as int) + 1}.0"}
    }

    @Benchmark
    String matcher_matches_use_case_dynamic() {
        final Matcher matcher = pattern.matcher(version)
        if (!matcher.matches()) {
            throw new IllegalStateException("Pattern didn't match!")
        }

        def major = matcher.group(1)
        def minor = matcher.group(2)

        return "v${major}.${(minor as int) + 1}.0".toString()
    }

    @Benchmark
    @CompileStatic
    String matcher_matches_use_case_static() {
        final Matcher matcher = pattern.matcher(version)
        if (!matcher.matches()) {
            throw new IllegalStateException("Pattern didn't match!")
        }

        def major = matcher.group(1)
        def minor = matcher.group(2)

        return "v${major}.${(minor as int) + 1}.0".toString()
    }
}

Here are the results.

Groovy 2.5.11-indy and OpenJDK 1.8.0_242

Benchmark                                                 Mode  Cnt  Score   Error  Units
A3_Regexp_Replace_Bench.matcher_matches_use_case_dynamic  avgt   60  0,570 ± 0,009  us/op
A3_Regexp_Replace_Bench.matcher_matches_use_case_static   avgt   60  0,488 ± 0,008  us/op
A3_Regexp_Replace_Bench.string_replace_first_dynamic      avgt   60  0,873 ± 0,014  us/op
A3_Regexp_Replace_Bench.string_replace_first_static       avgt   60  0,832 ± 0,013  us/op

Groovy 2.5.11-indy and OpenJDK 11.0.6

Benchmark                                                 Mode  Cnt  Score   Error  Units
A3_Regexp_Replace_Bench.matcher_matches_use_case_dynamic  avgt   60  0,579 ± 0,015  us/op
A3_Regexp_Replace_Bench.matcher_matches_use_case_static   avgt   60  0,543 ± 0,002  us/op
A3_Regexp_Replace_Bench.string_replace_first_dynamic      avgt   60  1,034 ± 0,034  us/op
A3_Regexp_Replace_Bench.string_replace_first_static       avgt   60  0,970 ± 0,023  us/op

Groovy 3.0.3-indy and OpenJDK 1.8.0_242

Benchmark                                                 Mode  Cnt  Score   Error  Units
A3_Regexp_Replace_Bench.matcher_matches_use_case_dynamic  avgt   60  0,574 ± 0,011  us/op
A3_Regexp_Replace_Bench.matcher_matches_use_case_static   avgt   60  0,516 ± 0,008  us/op
A3_Regexp_Replace_Bench.string_replace_first_dynamic      avgt   60  0,925 ± 0,016  us/op
A3_Regexp_Replace_Bench.string_replace_first_static       avgt   60  0,878 ± 0,015  us/op

Groovy 3.0.3-indy and OpenJDK 11.0.6

Benchmark                                                 Mode  Cnt  Score   Error  Units
A3_Regexp_Replace_Bench.matcher_matches_use_case_dynamic  avgt   60  0,609 ± 0,013  us/op
A3_Regexp_Replace_Bench.matcher_matches_use_case_static   avgt   60  0,552 ± 0,013  us/op
A3_Regexp_Replace_Bench.string_replace_first_dynamic      avgt   60  1,014 ± 0,028  us/op
A3_Regexp_Replace_Bench.string_replace_first_static       avgt   60  0,945 ± 0,026  us/op

And here is the last chart.

Thoughts?

  • The variant with String.replaceFirst() performs "slower" as expected.

  • Does it make a huge difference? I wouldn’t say so. In the worst-case scenario, using String.replaceFirst() with a closure like the one shown in the test will need 0.001 ms instead of 0.0005 ms to finish execution.

Conclusion

I hope those benchmarks have shown you that there are some small differences between the different variants. Still, they are not something you should be worried about. Measuring application performance is not trivial, and in most cases, real bottlenecks exist in entirely different areas. I can tell you from my experience that one of the first places worth checking when tweaking application’s performance is its I/O layer. I’ve seen countless times inefficient queries to the database, or threads blocked for nothing, and those are the real issues.

Share this blog post on Twitter or LinkedIn to help me spread the word. Thanks!