Welcome to "Groovy Regular Expressions - The Definitive Guide"! In the next 15 minutes, you are going to learn everything you need to start working productively with regular expressions in Groovy programming language. Let’s get started!

Introduction

I always found working with regular expressions in Java kind error-prone. It happened to me many times that I didn’t escape backslash character enough times, or I forgot to call matcher.matches() or matcher.find() explicitly. Luckily, the Groovy programming language makes working with regex much simpler. Let’s start by learning a few new operators that drastically improve our experience.

Operators

~string (pattern operator)

Groovy makes initializing java.util.regex.Pattern class simple thanks to the pattern operator. All you have to do is to put ~ right in front of the string literal (e.g. ~"([Gg]roovy)"), and it creates java.util.regex.Pattern object instead of the java.lang.String one.

Listing 1. Creating pattern object example
import java.util.regex.Pattern

def pattern = ~"([Gg])roovy"

assert pattern.class == Pattern

The above example is an equivalent of the following (more explicit) code:

Listing 2. More explicit pattern object creation example
import java.util.regex.Pattern

def pattern = Pattern.compile("([Gg])roovy")

assert pattern.class == Pattern

The difference between ~"pattern" and ~/pattern/

Groovy offers one significant improvement when it comes to working with regular expressions - so-called slashy strings. This syntax produces either regular java.lang.String (if it has no variables to interpolate) or groovy.lang.GString (if it contains variables to interpolate.)

// Running on Groovy 3.0.3

def number = 2

def str1 = /The number is 2/
def str2 = /The number is $number/

assert str1 instanceof String
assert str2 instanceof GString

The most useful feature of slashy string is that it eliminates the need for escaping backslashes in the regular expression.

// Running on Groovy 3.0.3

assert (/Version \d+\.\d+\.\d+/) == 'Version \\d+\\.\\d+\\.\\d+'

Of course, you have to remember to escape $ if you use one in the regular expression.

// Running on Groovy 3.0.3

assert 'The price is $99' ==~ /The price is \$\d+/

=~ (find operator)

To create java.util.regex.Matcher object, you can use Groovy’s find operator. On the left side, you put a string you want to test matching on. On the right side, you put a pattern, that can be either java.util.regex.Pattern or java.lang.String. Consider the following example.

Listing 3. Creating matcher using find operator example.
// Running on Groovy 3.0.3

def pattern = ~/\S+er\b/
def matcher = "My code is groovier and better when I use Groovy there" =~ pattern

assert pattern instanceof java.util.regex.Pattern
assert matcher instanceof java.util.regex.Matcher

assert matcher.find()
assert matcher.size() == 2
assert matcher[0..-1] == ["groovier", "better"]

Creating java.util.regex.Pattern object in the above example is optional. Instead, we could define a pattern using slashy string directly in the matcher line.

// Running on Groovy 3.0.3

def matcher = "My code is groovier and better when I use Groovy there" =~ /\S+er\b/

assert matcher instanceof java.util.regex.Matcher

assert matcher.find()
assert matcher.size() == 2
assert matcher[0..-1] == ["groovier", "better"]

When you get the java.util.regex.Matcher object, you can essentially use any of its standard methods, or you can continue reading to learn more Groovy way to do it.

Using =~ operator in context of boolean

You can also use java.util.regex.Matcher object in the context of the boolean expression (e.g., inside the if-statement.) In this case, Groovy implicitly invokes the matcher.find() method, which means that the expression evaluates to true if any part of the string matches the pattern.

Listing 4. Using matcher in the context of boolean expression example.
// Running on Groovy 3.0.3

if ("My code is groovier and better when I use Groovy there" =~ /\S+er\b/) {
    println "At least one element matches the pattern..."
}

if ("Lorem ipsum dolor sit amet" =~ /\d+/) {
    println "This line is not executed..."
}

==~ (exact match operator)

Groovy also adds a very useful ==~ exact match operator. It can be used in a similar way to the find operator, but it behaves a bit differently. It does not create java.util.regex.Matcher object, and instead, it returns boolean value. You can think of it as an equivalent of matcher.matches() method call - it tests if the entire string matches given pattern.

Listing 5. Using exact match operator examples.
// Running on Groovy 3.0.3

assert "v3.12.4" ==~ /v\d{1,3}\.\d{1,3}\.\d{1,3}/

assert !("GROOVY-123: some change" ==~ /[A-Z]{3,6}-\d{1,4}/)

assert "GROOVY-123: some change" ==~ /[A-Z]{3,6}-\d{1,4}.{1,100}/

Usage examples

Checking if specific string matches given pattern is not the only thing you can do with regular expressions. In many cases, you want to extract the data that matches the specific pattern or even replace all occurrences with a new value. You will learn how you can do such things using Groovy.

Extracting all matching elements

Let’s begin with extracting all matching elements. Groovy adds findAll() method to java.util.regex.Matcher class, and when invoked, it returns all matching elements. The below example uses this technique to extract all numbers from the given text.

Listing 6. Extracting all matching elements example.
// Running on Groovy 3.0.3

def text = """ (1)
This text contains some numbers like 1024
or 256. Some of them are odd (like 3) or
even (like 2).
"""

def result = (text =~ /\d+/).findAll()

assert result == ["1024", "256", "3", "2"] (2)
1Groovy’s multiline string example.
2Extracted values are of java.lang.String type. You may need to map them to integers if needed.

Extracting words that begin and end with the same letter

Let’s take a look at some practical more examples. In some cases, you need to extract words that start and end with the same (case-insensitive) letter. We could use a pattern /(?i)\b([a-z])[a-z]*\1\b/, where:

  • (?i) makes matching case-insensitive,

  • \b([a-z]) defines a group that matches the first letter in the word,

  • \1 refers to the first group (first letter in the word), and \b matches the end of the word.

This pattern extracts both the matching word and the letter. In Groovy, we can use spread operator to call first() method on each element to extract matching words.

Listing 7. Extracting words that begin and end with the same letter.
// Running on Groovy 3.0.3

def result = ("This is test. Test is good, lol." =~ /(?i)\b([a-z])[a-z]*\1\b/).findAll()*.first()

assert result == ["test", "Test", "lol"]

Extracting matching element(s) using named group

Java (and thus Groovy) supports named groups in the regular expressions. When you group a pattern using parentheses, add ?<name> right after the opening parenthesis to name a group. Naming groups allows you to extract values from matching pattern using those names, instead of the numeric index value. You can also use this named group to refer to the matching value when you call replaceAll() method on a matcher object.

In the below example, we use a pattern that defines ?<jira> named group.

Listing 8. Extracting matching element(s) using named group example.
// Running on Groovy 3.0.3

def matcher = "JIRA-231 lorem ipsum dolor sit amet" =~ /^(?<jira>[A-Z]{2,4}-\d{1,3}).*$/

assert matcher.matches() (1)
assert matcher.group("jira") == "JIRA-231" (2)
assert matcher.replaceAll('Found ${jira} ID') == 'Found JIRA-231 ID' (3)
1You need to test if pattern matches before you can extract group by name.
2When the string matches the pattern, you can use group(name) method to extract matching group.
3We can also use replaceAll() method to create a new string. Make sure you use a single quote String. Otherwise Groovy will try to interpolate ${jira} and fail.

Using multi assignment to extract matching elements

Another useful feature is multiple variable assignment. We can use it to extract matching values and assign them directly to specific variables. Let’s say you are parsing some data containing items with their prices and (optional) discount. Here is how you can extract price and discount and assign it to a variable in one line.

Listing 9. Using multiple assignments with a matcher object example.
// Running on Groovy 3.0.3

def (_,price,discount) = ('Some item name: $99.99 (-15%)' =~ /\$(\d{1,4}\.\d{2})\s?\(?(-\d+%)?\)?/)[0]

assert _ == '$99.99 (-15%)'
assert price == "99.99"
assert discount == "-15%"

I used _ as a name for the first variable that stores matching region, not useful in our case. Now, what happens if the row we process does not contain any discount information? The discount variable is set to null.

Listing 10. No discount information example.
// Running on Groovy 3.0.3

def (_,price,discount) = ('Some item name: $49.99' =~ /\$(\d{1,4}\.\d{2})\s?\(?(-\d+%)?\)?/)[0]

assert _ == '$49.99'
assert price == "49.99"
assert discount == null

Another popular example is extracting minor, major, and patch parts from the semantic version name. We can use multiple assignments to extract all three parts in a single line of code.

Listing 11. Using multiple assignments to extract major, minor, and patch from the semantic version.
// Running on Groovy 3.0.3

def (_,major,minor,patch) = ("v3.21.0" =~ /^v(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/)[0]

assert _ == "v3.21.0"
assert major == "3"
assert minor == "21"
assert patch == "0"

Groovy Tutorial | Working with regular expressions | #groovylang

In this video, I show you three features that make working with regular expressions in the Groovy programming language a hit! Java made us used to verbose regex processing. We had to write a few lines of code just to perform some simple regex. It doesn't have to be this way. Groovy makes working with regex very simple, thanks to the find operator (=~), exact match operator (==~), or slashy strings (e.g. /\d+\.\d+\.\d+/) that make writing regular expressions as simple as possible.

Watch now

9:00

Replacing matching elements using replaceFirst()

Extracting parts of the semantic version name to specific variables looks good, but what if I want to generate a new version by incrementing the patch part? Well, there is a simple solution to that problem as well. Groovy overloads String.replaceFirst(String rgxp, String replacement) method with replaceFirst(Pattern p, Closure c) and this variant is very powerful. We can extract matching parts in the closure and modify them as we wish. Take a look at the following example to see how you can increment the patch part in the semantic version.

Listing 12. Using replaceFirst() to increment patch part of the semantic version.
// Running on Groovy 3.0.3

def version = "v3.21.0"
def expected = "v3.21.1"
def pattern = ~/^v(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/

def newVersion = version.replaceFirst(pattern) { _, major, minor, patch ->
    "v${major}.${minor}.${(patch as int) + 1}"
}

assert newVersion == expected

Using pattern matcher in the switch case

Groovy extends supported types in the switch statement and allows you to use patterns. In this case, Groovy executes matcher.find() method to test if any region of the input string matches the pattern. Consider the following example.

Listing 13. Pattern in the switch case example.
// Running on Groovy 3.0.3

def input = "test"

switch (input) {
    case ~/\d{3}/:
        println "The number has 3 digits."
        break

    case ~/\w{4}/:
        println "The word has 4 letters."
        break

    default:
        println "Unrecognized..."
}

Running the above example produces the following output.

The word has 4 letters.

The benchmark

If you want to learn about the performance of Groovy Regular Expressions, continue reading the part 2 of this blog post.

Share this blog post on Twitter or LinkedIn to help me spread the word. Thanks!