Ruby Regular Expressions
Regular Expressions are a special sequence of characters that use a specialized syntax to match or search for sets of strings.
Regular expressions are composed of predefined specific characters and their combinations, forming a "rule string" that expresses a filter logic for strings.
Syntax
Regular Expressions are patterns that can be enclosed between slashes or any delimiter following %r
, as shown below:
/pattern/
/pattern/im # Options can be specified
%r!/usr/local! # Regular expression using a delimiter
Example
#!/usr/bin/ruby
line1 = "Cats are smarter than dogs";
line2 = "Dogs also like meat";
if ( line1 =~ /Cats(.*)/ )
puts "Line1 contains Cats"
end
if ( line2 =~ /Cats(.*)/ )
puts "Line2 contains Dogs"
end
The output of the above example is:
Line1 contains Cats
Regular Expression Modifiers
Regular expressions may include optional modifiers to control various aspects of matching. These modifiers are specified after the second slash, as shown in the example. Below is a list of possible modifiers:
Modifier | Description |
---|---|
i | Ignore case when matching text. |
o | Perform #{ } interpolation only once, evaluating the regex at the first instance. |
x | Ignore whitespace and allow comments within the expression. |
m | Match multi-line, treating newline characters as normal characters. |
u,e,s,n | Interpret the regex as Unicode (UTF-8), EUC, SJIS, or ASCII. If not specified, the regex is assumed to use the source encoding. |
Ruby allows you to start a regular expression with %r
followed by any delimiter, similar to how strings are delimited by %Q
. This is useful when describing patterns that contain many slashes that you do not want to escape.
# Matches a single slash character without escaping
%r|/|
# Flag characters can be matched with the following syntax
%r[</(.*)>]i
Regular Expression Patterns
Except for control characters, (+ ? . * ^ $ ( ) [ ] { } | ), all other characters match themselves. You can escape a control character by preceding it with a backslash.
The following table lists the regular expression syntax available in Ruby.
Pattern | Description | |
---|---|---|
^ | Matches the beginning of a line. | |
$ | Matches the end of a line. | |
. | Matches any single character except a newline. With the m option, it can match a newline. | |
[...] | Matches any single character in brackets. | |
[^...] | Matches any single character not in brackets. | |
re* | Matches zero or more occurrences of the preceding subexpression. | |
re+ | Matches one or more occurrences of the preceding subexpression. | |
re? | Matches zero or one occurrence of the preceding subexpression. | |
re{ n} | Matches exactly n occurrences of the preceding subexpression. | |
re{ n,} | Matches n or more occurrences of the preceding subexpression. | |
re{ n, m} | Matches at least n and at most m occurrences of the preceding subexpression. | |
a | b | Matches either a or b. |
(re) | Groups regular expressions and remembers the matched text. | |
(?imx) | Temporarily toggles on i, m, or x options within the regex. If in parentheses, it affects only that area. | |
(?-imx) | Temporarily toggles off i, m, or x options within the regex. If in parentheses, it affects only that area. | |
(?: re) | Groups regular expressions without remembering matched text. | |
(?imx: re) | Temporarily toggles on i, m, or x options within the parentheses. | |
(?-imx: re) | Temporarily toggles off i, m, or x options within the parentheses. | |
(?#...) | Comment. | |
(?= re) | Specifies position using a pattern. No range. | |
(?! re) | Specifies position using a pattern's negation. No range. | |
(?> re) | Matches independent pattern without backtracking. | |
\w | Matches word characters. | |
\W | Matches non-word characters. | |
\s | Matches whitespace. Equivalent to [\t\n\r\f]. | |
\S | Matches non-whitespace. | |
\d | Matches digits. Equivalent to [0-9]. | |
\D | Matches non-digits. | |
\A | Matches the start of the string. | |
\Z | Matches the end of the string. If a newline exists, it matches just before newline. | |
\z | Matches the end of the string. | |
\G | Matches the point where the last match finished. | |
\b | Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets. | |
\B | Matches non-word boundaries. | |
\n, \t, etc. | Matches newlines, carriage returns, tabs, etc. | |
\1...\9 | Matches nth grouped subexpression. | |
\10 | Matches nth grouped subexpression if it has been matched. Otherwise, refers to the octal representation of a character code. |
Regular Expression Examples
Characters
Example | Description |
---|---|
/ruby/ | Matches "ruby" |
¥ | Matches the Yen symbol. Ruby 1.9 and Ruby 1.8 support multiple characters. |
Character Classes
Example | Description |
---|---|
/[Rr]uby/ | Matches "Ruby" or "ruby" |
/rub[ye]/ | Matches "ruby" or "rube" |
/[aeiou]/ | Matches any one lowercase vowel |
/[0-9]/ | Matches any digit; same as /[0123456789]/ |
/[a-z]/ | Matches any lowercase ASCII letter |
/[A-Z]/ | Matches any uppercase ASCII letter |
/[a-zA-Z0-9]/ | Matches any of the above |
/[^aeiou]/ | Matches anything other than a lowercase vowel |
/[^0-9]/ | Matches anything other than a digit |
Special Character Classes
Example | Description |
---|---|
/./ | Matches any character except a newline |
/./m | In multiline mode, also matches a newline |
/\d/ | Matches a digit; equivalent to /[0-9]/ |
/\D/ | Matches a nondigit; equivalent to /[^0-9]/ |
/\s/ | Matches a whitespace character; equivalent to /[ \t\r\n\f]/ |
/\S/ | Matches a non-whitespace character; equivalent to /[^ \t\r\n\f]/ |
/\w/ | Matches a word character; equivalent to /[A-Za-z0-9_]/ |
/\W/ | Matches a non-word character; equivalent to /[^A-Za-z0-9_]/ |
Repetition
Example | Description |
---|---|
/ruby?/ | Matches "rub" or "ruby". The y is optional. |
/ruby*/ | Matches "rub" plus 0 or more y's. |
/ruby+/ | Matches "rub" plus 1 or more y's. |
/\d{3}/ | Matches exactly 3 digits. |
/\d{3,}/ | Matches 3 or more digits. |
/\d{3,5}/ | Matches 3, 4, or 5 digits. |
Non-greedy Repetition
This matches the smallest number of repetitions.
Example | Description |
---|---|
/<.*>/ | Greedy repetition: matches "<ruby>perl>" |
/<.*?>/ | Non-greedy repetition: matches "<ruby>" in "<ruby>perl>" |
Grouping with Parentheses
Example | Description |
---|---|
/\D\d+/ | No grouping: + repeats \d |
/(\D\d)+/ | Grouping: + repeats \D\d pair |
/([Rr]uby(, )?)+/ | Matches "Ruby", "Ruby, ruby, ruby", etc. |
Backreferences
This matches a previously matched group again.
Example | Description |
---|---|
/([Rr])uby&\1ails/ | Matches ruby&rails or Ruby&Rails |
/(['"])(?:(?!\1).)*\1/ | Single or double-quoted string. \1 matches whatever the 1st group matched, \2 matches whatever the 2nd group matched, etc. |
Substitution
Example | Description | |
---|---|---|
/ruby | rube/ | Matches "ruby" or "rube" |
/rub(y | le)/ | Matches "ruby" or "ruble" |
/ruby(!+ | \?)/ | "ruby" followed by one or more ! or one ? |
Anchors
This requires specifying the matching position.
Example | Description |
---|---|
/^Ruby/ | Matches "Ruby" at the start of a string or internal line |
/Ruby$/ | Matches "Ruby" at the end of a string or line |
/\ARuby/ | Matches "Ruby" at the start of a string |
/Ruby\Z/ | Matches "Ruby" at the end of a string |
/\bRuby\b/ | Matches "Ruby" at a word boundary |
/\brub\B/ | \B is non-word boundary: matches "rub" in "rube" and "ruby" but not alone |
/Ruby(?=!)/ | Matches "Ruby" if followed by an exclamation mark |
/Ruby(?!!)/ | Matches "Ruby" if not followed by an exclamation mark |
Special Syntax with Parentheses
Example | Description | |
---|---|---|
/R(?#comment)/ | Matches "R". All the rest is a comment. | |
/R(?i)uby/ | Case-insensitive when matching "uby". | |
/R(?i:uby)/ | Same as above. | |
/rub(?:y | le))/ | Groups without back-references |
Search and Replace
sub and gsub along with their destructive versions sub! and gsub! are important string methods that use regular expression patterns for search and replace operations.
sub and sub! replace the first occurrence of the pattern, while gsub and gsub! replace all occurrences.
sub and gsub return a new string, leaving the original unchanged, whereas sub! and gsub! modify the string they are called on.
Example
#!/usr/bin/ruby
# -*- coding: UTF-8 -*-
phone = "138-3453-1111 #This is a phone number"
# Delete Ruby-style comments
phone = phone.sub!(/#.*$/, "")
puts "Phone Num : #{phone}"
# Remove anything other than digits
phone = phone.gsub!(/\D/, "")
puts "Phone Num : #{phone}"
The output of the above example is:
Phone Num : 138-3453-1111
Phone Num : 13834531111
Example
#!/usr/bin/ruby
# -*- coding: UTF-8 -*-
text = "rails are rails, Ruby on Rails is a fantastic Ruby framework"
# Replace all occurrences of "rails" to "Rails"
text.gsub!("rails", "Rails")
# Capitalize the word "Rails" throughout
text.gsub!(/\brails\b/, "Rails")
puts "#{text}"
The output of the above example is:
Rails are Rails, Ruby on Rails is a fantastic Ruby framework
Rails is Rails, a very good Ruby framework for Ruby on Rails.