Perl Regular Expressions
Regular expressions (regex) describe a pattern for matching characters in strings. They can be used to check if a string contains a certain substring, replace matching substrings, or extract substrings that meet certain criteria.
Perl's regular expression capabilities are extremely powerful, often considered the most robust among commonly used languages. Many languages that support regular expressions reference Perl's implementation.
Perl's regular expressions come in three forms: matching, substitution, and translation:
- Matching: m/
- Substitution: s/
- Translation: tr/
These forms are typically used with =~
or !~
, where =~
denotes a match and !~
denotes no match.
Matching Operator
The matching operator m//
is used to match a string or a regular expression. For example, to match "run" in the scalar $bar
, the code would look like this:
Example
#!/usr/bin/perl
$bar = "I am tutorialpro site. welcome to tutorialpro site.";
if ($bar =~ /run/){
print "First match\n";
}else{
print "First no match\n";
}
$bar = "run";
if ($bar =~ /run/){
print "Second match\n";
}else{
print "Second no match\n";
}
Executing the above program will output:
First match
Second match
Pattern Matching Modifiers
Some common pattern matching modifiers are:
Modifier | Description |
---|---|
i | Ignore case in the pattern |
m | Multi-line mode |
o | Evaluate the expression only once |
s | Single-line mode, where "." matches "\n" (default does not) |
x | Ignore whitespace in the pattern |
g | Global matching |
cg | Allow further matching even after a global match fails |
Regular Expression Variables
After processing, Perl stores the matched values in three special variables:
$
: The part of the string before the matched portion$&
: The matched string$'
: The part of the string after the matched portion
Combining these variables will give you the original string.
Example
#!/usr/bin/perl
$string = "welcome to tutorialpro site.";
$string =~ m/run/;
print "String before match: $`\n";
print "Matched string: $&\n";
print "String after match: $'\n";
Executing the above program will output:
String before match: welcome to
Matched string: run
String after match: oob site.
Substitution Operator
The substitution operator s///
extends the matching operator, replacing the specified string with a new one. The basic format is:
s/PATTERN/REPLACEMENT/;
Where PATTERN
is the matching pattern and REPLACEMENT
is the string to replace with.
For example, to replace "google" with "tutorialpro" in the following string:
Example
#!/usr/bin/perl
$string = "welcome to google site.";
$string =~ s/google/tutorialpro/;
print "$string\n";
Executing the above program will output:
welcome to tutorialpro site.
Substitution Modifiers
Substitution modifiers include:
Modifier | Description |
---|---|
i | Case-insensitive matching |
m | Multi-line mode |
o | Evaluate the expression only once |
s | Single-line mode, where "." matches "\n" |
x | Ignore whitespace in the pattern |
g | Replace all occurrences |
e | Evaluate the replacement string as an expression |
Translation Operator
Modifiers related to the translation operator include:
Modifier | Description |
---|---|
c | Translate all characters not specified |
d | Delete all specified characters |
s | Compress multiple identical characters to a single character |
The following example converts all lowercase letters in the variable $string
to uppercase:
#!/usr/bin/perl
$string = 'welcome to tutorialpro site.';
$string =~ tr/a-z/A-Z/;
print "$string\n";
Executing the above program will output:
WELCOME TO TUTORIALPRO SITE.
The following example uses /s
to remove duplicate characters in the variable $string
:
Example
#!/usr/bin/perl
$string = 'tutorialpro';
$string =~ tr/a-z/a-z/s;
print "$string\n";
Executing the above program will output:
runob
More examples:
$string =~ tr/\d/ /c; # Replace all non-digit characters with a space
$string =~ tr/\t //d; # Remove tabs and spaces
$string =~ tr/0-9/ /cs # Replace other characters between digits with a space
More Regular Expression Rules
Expression | Description |
---|---|
. | Matches any character except a newline |
x? | Matches 0 or 1 occurrence of x |
x* | Matches 0 or more occurrences of x, as few as possible |
x+ | Matches 1 or more occurrences of x, as few as possible |
.* | Matches 0 or more of any character |
.+ | Matches 1 or more of any character |
{m} | Matches exactly m occurrences of the specified string |
{m,n} | Matches between m and n occurrences of the specified string |
{m,} | Matches at least m occurrences of the specified string |
[] | Matches any character within the brackets |
[^] | Matches any character not within the brackets |
[0-9] | Matches any digit |
[a-z] | Matches any lowercase letter |
[^0-9] | Matches any non-digit |
[^a-z] | Matches any non-lowercase letter |
^ | Matches the start of a string |
$ | Matches the end of a string |
\d | Matches a digit, same as [0-9] |
\d+ | Matches one or more digits, same as [0-9]+ |
\D | Matches a non-digit, opposite of \d |
\D+ | Matches one or more non-digits, opposite of \d+ |
\w | Matches any alphanumeric character, same as [a-zA-Z0-9_] |
\w+ | Matches one or more alphanumeric characters, same as [a-zA-Z0-9_]+ |
\W | Matches any non-alphanumeric character, opposite of \w |
\W+ | Matches one or more non-alphanumeric characters, opposite of \w+ |
\s | Matches any whitespace character, same as [\n\t\r\f] |
\s+ | Matches one or more whitespace characters, same as [\n\t\r\f]+ |
\S | Matches any non-whitespace character, opposite of \s |
\S+ | Matches one or more non-whitespace characters, opposite of \s+ |
\b | Matches a word boundary |
\B | Matches a non-word boundary |
a|b|c | Matches either a, b, or c |
abc | Matches the string "abc" |
(pattern) | Groups the pattern and remembers the matched text |
/pattern/i | Case-insensitive matching |
\ | Escapes a special character, making it inactive |
More Reference Content
Regular Expressions: https://www.tutorialpro.org/regexp/regexp-tutorial.html
Perl Regular Expressions: https://perldoc.perl.org/perlre#Regular-Expressions