Easy Tutorial
❮ Julia Complex And Rational Numbers Julia Tuples ❯

Julia Regular Expressions

Regular expressions (regexes) describe a pattern for matching strings, which can be used to check if a string contains a certain substring, replace matching substrings, or extract substrings that meet certain criteria.

Julia supports Perl-compatible regular expressions (regexes).

There are three forms of regular expressions in Julia: matching, replacing, and transforming:

These forms are typically used with =~ or !~, where =~ indicates a match and !~ indicates no match.

In Julia, regular expression inputs are prefixed with r:

Example

julia> re = r"^\s*(?:#|$)"
r"^\s*(?:#|$)"

julia> typeof(re)
Regex

To check if a regular expression matches a string, use occursin:

Example

julia> occursin(r"^\s*(?:#|$)", "not a comment")
false

julia> occursin(r"^\s*(?:#|$)", "# a comment")
true

occursin only returns true or false, indicating whether the given regular expression is found in the string. However, often we want to know not just if there's a match, but how it matches. To capture match information, use the match function:

Example

julia> match(r"^\s*(?:#|$)", "not a comment")

julia> match(r"^\s*(?:#|$)", "# a comment")
RegexMatch("#")

If the regular expression does not match the given string, match returns nothing—a special value that prints nothing in the interactive prompt. Despite not printing, it is a fully functional value and can be tested programmatically:

Example

m = match(r"^\s*(?:#|$)", line)
if m === nothing
    println("not a comment")
else
    println("blank or comment")
end

If the regular expression matches, the return value of match is a RegexMatch object. These objects record how the expression matched, including the substring that matched the pattern and any captured substrings. The above example only captures the matching part, but perhaps we want to capture any non-empty text following the comment character. We can do this:

Example

julia> m = match(r"^\s*(?:#\s*(.*?)\s*$|$)", "# a comment ")
RegexMatch("# a comment ", 1="a comment")

When calling match, you can optionally specify the index to start the search. For example:

Example

julia> m = match(r"[0-9]","aaaa1aaaa2aaaa3",1)
RegexMatch("1")

julia> m = match(r"[0-9]","aaaa1aaaa2aaaa3",6)
RegexMatch("2")

julia> m = match(r"[0-9]","aaaa1aaaa2aaaa3",11)
RegexMatch("3")

You can extract the following information from a RegexMatch object:

When captures do not match, m.captures does not contain a substring but rather nothing; additionally, m.offsets has an offset of 0 (recall that Julia's indexing starts at 1, so a zero offset is invalid). Here are two contrived examples:

Example

julia> m = match(r"(a|b)(c)?(d)", "acd")
RegexMatch("acd", 1="a", 2="c", 3="d")

julia> m.match
"acd"

julia> m.captures
3-element Vector{Union{Nothing, SubString{String}}}:
 "a"
 "c"
 "d"

julia> m.offset
1

julia> m.offsets
3-element Vector{Int64}:
 1
 2
 3

julia> m = match(r"(a|b)(c)?(d)", "ad")
RegexMatch("ad", 1="a", 2=nothing, 3="d")

julia> m.match
"ad"

julia> m.captures

This is a 3-element Vector{Union{Nothing, SubString{String}}}: "a" nothing "d"

julia> m.offset 1

julia> m.offsets 3-element Vector{Int64}: 1 0 2

Returning captures as an array is convenient, allowing them to be bound to local variables using destructuring syntax. For convenience, the RegexMatch object implements an iterator method that passes through to the captures field, so you can directly destructure the match object:

Example

julia> first, second, third = m; first
"a"

Captures can also be accessed by indexing the RegexMatch object with the number or name of the capture group:

Example

julia> m = match(r"(?<hour>\d+):(?<minute>\d+)", "12:45")
RegexMatch("12:45", hour="12", minute="45")

julia> m[:minute]
"45"

julia> m[2]
"45"

Using \n to reference the nth capture group and prefixing the replacement string with 's' in replace allows referencing captures within the replacement string. Capture group 0 refers to the entire match object. You can use \g in the replacement for clarity.

julia> replace("first second", r"(\w+) (?<agroup>\w+)" => s"\g<agroup> \1")
"second first"

For clarity, numbered capture groups can also be referenced with \g.

julia> replace("a", r"." => s"\g<0>1")
"a1"

You can modify the regular expression by adding flags like i, m, s, and x after the double quotes.

For more information on regular expressions, refer to: Regular Expressions - Tutorial

❮ Julia Complex And Rational Numbers Julia Tuples ❯