Easy Tutorial
❮ Ruby Multithreading Ruby Datatypes ❯

Ruby Regular Expressions

Regular Expressions are a special sequence of characters that use a specialized syntax to match or search for sets of strings.

Regular expressions are composed of predefined specific characters and their combinations, forming a "rule string" that expresses a filter logic for strings.

Syntax

Regular Expressions are patterns that can be enclosed between slashes or any delimiter following %r, as shown below:

/pattern/
/pattern/im    # Options can be specified
%r!/usr/local! # Regular expression using a delimiter

Example

#!/usr/bin/ruby

line1 = "Cats are smarter than dogs";
line2 = "Dogs also like meat";

if ( line1 =~ /Cats(.*)/ )
  puts "Line1 contains Cats"
end
if ( line2 =~ /Cats(.*)/ )
  puts "Line2 contains Dogs"
end

The output of the above example is:

Line1 contains Cats

Regular Expression Modifiers

Regular expressions may include optional modifiers to control various aspects of matching. These modifiers are specified after the second slash, as shown in the example. Below is a list of possible modifiers:

Modifier Description
i Ignore case when matching text.
o Perform #{ } interpolation only once, evaluating the regex at the first instance.
x Ignore whitespace and allow comments within the expression.
m Match multi-line, treating newline characters as normal characters.
u,e,s,n Interpret the regex as Unicode (UTF-8), EUC, SJIS, or ASCII. If not specified, the regex is assumed to use the source encoding.

Ruby allows you to start a regular expression with %r followed by any delimiter, similar to how strings are delimited by %Q. This is useful when describing patterns that contain many slashes that you do not want to escape.

# Matches a single slash character without escaping
%r|/|               

# Flag characters can be matched with the following syntax
%r[</(.*)>]i

Regular Expression Patterns

Except for control characters, (+ ? . * ^ $ ( ) [ ] { } | ), all other characters match themselves. You can escape a control character by preceding it with a backslash.

The following table lists the regular expression syntax available in Ruby.

Pattern Description
^ Matches the beginning of a line.
$ Matches the end of a line.
. Matches any single character except a newline. With the m option, it can match a newline.
[...] Matches any single character in brackets.
[^...] Matches any single character not in brackets.
re* Matches zero or more occurrences of the preceding subexpression.
re+ Matches one or more occurrences of the preceding subexpression.
re? Matches zero or one occurrence of the preceding subexpression.
re{ n} Matches exactly n occurrences of the preceding subexpression.
re{ n,} Matches n or more occurrences of the preceding subexpression.
re{ n, m} Matches at least n and at most m occurrences of the preceding subexpression.
a b Matches either a or b.
(re) Groups regular expressions and remembers the matched text.
(?imx) Temporarily toggles on i, m, or x options within the regex. If in parentheses, it affects only that area.
(?-imx) Temporarily toggles off i, m, or x options within the regex. If in parentheses, it affects only that area.
(?: re) Groups regular expressions without remembering matched text.
(?imx: re) Temporarily toggles on i, m, or x options within the parentheses.
(?-imx: re) Temporarily toggles off i, m, or x options within the parentheses.
(?#...) Comment.
(?= re) Specifies position using a pattern. No range.
(?! re) Specifies position using a pattern's negation. No range.
(?> re) Matches independent pattern without backtracking.
\w Matches word characters.
\W Matches non-word characters.
\s Matches whitespace. Equivalent to [\t\n\r\f].
\S Matches non-whitespace.
\d Matches digits. Equivalent to [0-9].
\D Matches non-digits.
\A Matches the start of the string.
\Z Matches the end of the string. If a newline exists, it matches just before newline.
\z Matches the end of the string.
\G Matches the point where the last match finished.
\b Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
\B Matches non-word boundaries.
\n, \t, etc. Matches newlines, carriage returns, tabs, etc.
\1...\9 Matches nth grouped subexpression.
\10 Matches nth grouped subexpression if it has been matched. Otherwise, refers to the octal representation of a character code.

Regular Expression Examples

Characters

Example Description
/ruby/ Matches "ruby"
¥ Matches the Yen symbol. Ruby 1.9 and Ruby 1.8 support multiple characters.

Character Classes

Example Description
/[Rr]uby/ Matches "Ruby" or "ruby"
/rub[ye]/ Matches "ruby" or "rube"
/[aeiou]/ Matches any one lowercase vowel
/[0-9]/ Matches any digit; same as /[0123456789]/
/[a-z]/ Matches any lowercase ASCII letter
/[A-Z]/ Matches any uppercase ASCII letter
/[a-zA-Z0-9]/ Matches any of the above
/[^aeiou]/ Matches anything other than a lowercase vowel
/[^0-9]/ Matches anything other than a digit

Special Character Classes

Example Description
/./ Matches any character except a newline
/./m In multiline mode, also matches a newline
/\d/ Matches a digit; equivalent to /[0-9]/
/\D/ Matches a nondigit; equivalent to /[^0-9]/
/\s/ Matches a whitespace character; equivalent to /[ \t\r\n\f]/
/\S/ Matches a non-whitespace character; equivalent to /[^ \t\r\n\f]/
/\w/ Matches a word character; equivalent to /[A-Za-z0-9_]/
/\W/ Matches a non-word character; equivalent to /[^A-Za-z0-9_]/

Repetition

Example Description
/ruby?/ Matches "rub" or "ruby". The y is optional.
/ruby*/ Matches "rub" plus 0 or more y's.
/ruby+/ Matches "rub" plus 1 or more y's.
/\d{3}/ Matches exactly 3 digits.
/\d{3,}/ Matches 3 or more digits.
/\d{3,5}/ Matches 3, 4, or 5 digits.

Non-greedy Repetition

This matches the smallest number of repetitions.

Example Description
/<.*>/ Greedy repetition: matches "<ruby>perl>"
/<.*?>/ Non-greedy repetition: matches "<ruby>" in "<ruby>perl>"

Grouping with Parentheses

Example Description
/\D\d+/ No grouping: + repeats \d
/(\D\d)+/ Grouping: + repeats \D\d pair
/([Rr]uby(, )?)+/ Matches "Ruby", "Ruby, ruby, ruby", etc.

Backreferences

This matches a previously matched group again.

Example Description
/([Rr])uby&\1ails/ Matches ruby&rails or Ruby&Rails
/(['"])(?:(?!\1).)*\1/ Single or double-quoted string. \1 matches whatever the 1st group matched, \2 matches whatever the 2nd group matched, etc.

Substitution

Example Description
/ruby rube/ Matches "ruby" or "rube"
/rub(y le)/ Matches "ruby" or "ruble"
/ruby(!+ \?)/ "ruby" followed by one or more ! or one ?

Anchors

This requires specifying the matching position.

Example Description
/^Ruby/ Matches "Ruby" at the start of a string or internal line
/Ruby$/ Matches "Ruby" at the end of a string or line
/\ARuby/ Matches "Ruby" at the start of a string
/Ruby\Z/ Matches "Ruby" at the end of a string
/\bRuby\b/ Matches "Ruby" at a word boundary
/\brub\B/ \B is non-word boundary: matches "rub" in "rube" and "ruby" but not alone
/Ruby(?=!)/ Matches "Ruby" if followed by an exclamation mark
/Ruby(?!!)/ Matches "Ruby" if not followed by an exclamation mark

Special Syntax with Parentheses

Example Description
/R(?#comment)/ Matches "R". All the rest is a comment.
/R(?i)uby/ Case-insensitive when matching "uby".
/R(?i:uby)/ Same as above.
/rub(?:y le))/ Groups without back-references

Search and Replace

sub and gsub along with their destructive versions sub! and gsub! are important string methods that use regular expression patterns for search and replace operations.

sub and sub! replace the first occurrence of the pattern, while gsub and gsub! replace all occurrences.

sub and gsub return a new string, leaving the original unchanged, whereas sub! and gsub! modify the string they are called on.

Example

#!/usr/bin/ruby
# -*- coding: UTF-8 -*-

phone = "138-3453-1111 #This is a phone number"

# Delete Ruby-style comments
phone = phone.sub!(/#.*$/, "")   
puts "Phone Num : #{phone}"

# Remove anything other than digits
phone = phone.gsub!(/\D/, "")    
puts "Phone Num : #{phone}"

The output of the above example is:

Phone Num : 138-3453-1111 
Phone Num : 13834531111

Example

#!/usr/bin/ruby
# -*- coding: UTF-8 -*-

text = "rails are rails, Ruby on Rails is a fantastic Ruby framework"

# Replace all occurrences of "rails" to "Rails"
text.gsub!("rails", "Rails")

# Capitalize the word "Rails" throughout
text.gsub!(/\brails\b/, "Rails")

puts "#{text}"

The output of the above example is:

Rails are Rails, Ruby on Rails is a fantastic Ruby framework

Rails is Rails, a very good Ruby framework for Ruby on Rails.

❮ Ruby Multithreading Ruby Datatypes ❯