Regular Expressions - `Examples`

Simple Expressions

The simplest form of a regular expression is a single ordinary character that matches itself in the search string. For example, a single-character pattern like A will always match the letter A wherever it appears in the search string. Here are some examples of single-character regular expression patterns:

/a/
/7/
/M/

Multiple single-character expressions can be combined to form larger expressions. For example, the following regular expression combines the single-character expressions: a, 7, and M.

/a7M/

Note that there is no concatenation operator. Simply type one character followed by another.

Character Matching

The dot . matches various printable or non-printable characters in the string, except for the newline characters \n and \r. The following regular expression matches aac, abc, acc, adc, etc., as well as a1c, a2c, a-c, and a#c:

/a.c/

To match a string containing a filename where the period . is part of the input string, precede the period in the regular expression with a backslash \ character. For example, the following regular expression matches filename.ext:

/filename\.ext/

These expressions only allow you to match "any" single character. You might need to match specific groups of characters from a list. For example, you might need to find chapter titles represented by numbers (Chapter 1, Chapter 2, etc.).

A Reasonable Username Regular Expression

A username can contain the following types of characters:

1. 26 uppercase and lowercase English letters represented as a-zA-Z.
1. Digits represented as 0-9.
1. Underscores represented as _.
1. Hyphens represented as -.

A username consists of multiple letters, digits, underscores, and hyphens, so the + symbol is used to indicate one or more occurrences.

Based on the above conditions, the expression for a username can be:

[a-zA-Z0-9_-]+

Example

var str = "abc123-_def";
var patt = /[a-zA-Z0-9_-]+/;
document.write(str.match(patt));

The following marked text is the matched expression: abc123-_def

If hyphens are not needed:

[a-zA-Z0-9_]+

Example

var str = "abc123def";
var str2 = "abc123_def";
var patt = /[a-zA-Z0-9_]+/;
document.write(str.match(patt));
document.write(str2.match(patt));

The following marked text is the matched expression: abc123def``abc123_def

Matching HTML Tags and Content

The following regular expression is used to match an iframe tag:

/&lt;iframe(([\s\S])*?)<\/iframe>/

To match other tags, replace iframe.

To match a div tag with id="mydiv":

/&lt;div id="mydiv"(([\s\S])*?)<\/div>/

To match all img tags:

Example

/&lt;img.*?src="(.*?)".*?\/?>/gi

Bracket Expressions

To create a list of characters to match, place one or more single characters inside square brackets [ ]. When characters are enclosed in brackets, the list is called a "bracket expression." As with any other position, ordinary characters inside brackets represent themselves, matching themselves once in the input text. Most special characters lose their significance when they appear inside bracket expressions. However, there are some exceptions, such as:

If the ] character is not the first item, it ends the list. To match a ] character in the list, place it first, immediately after the opening [.
The \ character continues to act as an escape character. To match a \ character, use \\.

Characters enclosed in a bracket expression only match a single character in that position in the regular expression. The following regular expression matches Chapter 1, Chapter 2, Chapter 3, Chapter 4, and Chapter 5:

/Chapter [12345]/

Note that the position of the word Chapter and the space following it is fixed relative to the characters inside the brackets. The bracket expression specifies only the set of characters that match at the position immediately following the word Chapter and the space. This is the ninth character position.

To represent matching character groups using ranges instead of individual characters, use a hyphen - to separate the starting and ending characters of the range. The character value of the single character determines the relative order within the range. The following regular expression includes a range expression that is equivalent to the list shown above in the brackets.

/Chapter [1-5]/

When specifying ranges in this way, both the start and end values are included in the range. It is important to note that, according to the Unicode sorting order, the start value must come before the end value.

To include a hyphen in a bracket expression, use one of the following methods:

Escape it with a backslash:
```
[\-]
```
Place the hyphen at the beginning or end of the bracketed list. The following expressions match all lowercase letters and hyphens:
```
[-a-z]
[a-z-]
```

Create a range where the starting character value is less than the hyphen, and the ending character value is equal to or greater than the hyphen. The following two regular expressions meet this requirement:

[!--]
[!-~]

To find all characters that are not in the list or range, place the caret ^ at the beginning of the list. If the caret appears at any other position in the list, it matches itself. The following regular expression matches any digit or character other than 1, 2, 3, 4, or 5:

/Chapter [^12345]/

In the above example, the expression matches any digit or character other than 1, 2, 3, 4, or 5 at the ninth position. Thus, for example, "Chapter 7" is a match, and "Chapter 9" is also a match.

The above expression can be represented with a hyphen - as:

/Chapter [^1-5]/

The typical use of bracket expressions is to specify matches for any uppercase or lowercase letter or any digit. The following expression specifies such a match:

/[A-Za-z0-9]/

Substitution and Grouping

Substitution uses the | character to allow selection between two or more alternative options. For example, you can extend the chapter title regular expression to return a broader range of matches. However, this is not as simple as you might think. Substitution matches the largest expression on either side of the | character.

You might think that the following expression matches "Chapter" at the beginning of a line and "Section" at the end of a line, followed by one or two digits:

/^Chapter|Section [1-9][0-9]{0,1}$/

Unfortunately, the above regular expression either matches the word "Chapter" at the beginning of a line or the word "Section" at the end of a line followed by any digits. If the input string is "Chapter 22", the above expression only matches the word "Chapter". If the input string is "Section 22", it matches "Section 22".

To make the regular expression more controllable, you can use parentheses to limit the scope of the substitution, ensuring it only applies to the words "Chapter" and "Section". However, parentheses are also used to create subexpressions and may capture them for later use, which is discussed in the section on backreferences. By adding parentheses at the appropriate positions in the above regular expression, you can make it match "Chapter 1" or "Section 3".

The following regular expression uses parentheses to combine "Chapter" and "Section" so that the expression works correctly:

/^(Chapter|Section) [1-9][0-9]{0,1}$/

Although these expressions work normally, the parentheses around "Chapter|Section" also capture either of the two matches for later use. Since there is only one set of parentheses in the above expression, there is only one captured "submatch".

In the above example, you only need to use parentheses to combine the choice between the words "Chapter" and "Section". To prevent the match from being saved for later use, place ?: before the regular expression pattern inside the parentheses. The following modification provides the same capability without saving the submatch:

/^(?:Chapter|Section) [1-9][0-9]{0,1}$/

In addition to the ?: metacharacter, two other non-capturing metacharacters create what is known as "lookahead" matches. Positive lookahead uses ?= to specify that it matches the starting point of the search string that is followed by the regular expression pattern in parentheses. Negative lookahead uses ?! to specify that it matches the starting point of the search string that is not followed by the regular expression pattern.

For example, suppose you have a document that contains references to Windows 3.1, Windows 95, Windows 98, and Windows NT. Further assume that you need to update the document to change all references to Windows 95, Windows 98, and Windows NT to Windows 2000. The following regular expression (a positive lookahead example) matches Windows 95, Windows 98, and Windows NT:

/Windows(?=95 |98 |NT )/

After finding a match, the search continues immediately after the matched text (excluding the lookahead characters) for the next match. For example, if the above expression matches "Windows 98", the search continues after "Windows" and not after "98".

Other Examples

Below are some regular expression examples:

Regular Expression	Description
/\b([a-z]+) \1\b/gi	A position where a word repeats itself.
/(\w+):\/\/([^/:]+)(:\d)?([^# ])/	Matches a URL and parses it into protocol, domain, port, and relative path.
/^(?:Chapter	Section) [1-9][0-9]{0,1}$/	Locates the position of a chapter.
/[-a-z]/	26 letters from a to z plus a hyphen.
/ter\b/	Can match "chapter" but not "terminal".
/\Bapt/	Can match "chapter" but not "aptitude".
/Windows(?=95	98	NT )/	Can match "Windows95" or "Windows98" or "WindowsNT", after finding a match, starts the next search match from after "Windows".
/^\s*$/	Matches empty lines.
/\d{2}-\d{5}/	Validates an ID number consisting of two digits, a hyphen, and then five digits.
<[a-zA-Z]+.?>([\s\S]?)</[a-zA-Z]*?>	Matches HTML tags.

Regular Expression	Description
hello	Matches {hello}
gray	grey	Matches {gray, grey}
gr(a	e)y	Matches {gray, grey}
gr[ae]y	Matches {gray, grey}
b[aeiou]bble	Matches {babble, bebble, bibble, bobble, bubble}
[b-chm-pP]at	ot	Matches {bat, cat, hat, mat, nat, oat, pat, Pat, ot}
colou?r	Matches {color, colour}
rege(x(es)?	xps?)	Matches {regex, regexes, regexp, regexps}
go*gle	Matches {ggle, gogle, google, gooogle, goooogle, ...}
go+gle	Matches {gogle, google, gooogle, goooogle, ...}
g(oog)+le	Matches {google, googoogle, googoogoogle, googoogoogoogle, ...}
z{3}	Matches {zzz}
z{3,6}	Matches {zzz, zzzz, zzzzz, zzzzzz}
z{3,}	Matches {zzz, zzzz, zzzzz, ...}
[Bb]rainf**k	Matches {Brainfk, brainfk}
\d	Matches {0,1,2,3,4,5,6,7,8,9}
1\d{10}	Matches 11 digits starting with 1
[2-9]	[12]\d	3[0-6]	Matches integers from 2 to 36
Hello\nworld	Matches "Hello" followed by a newline, followed by "world"
\d+(.\d\d)?	Includes a positive integer or a floating point number with two decimal places.
[^*@#]	Excludes the special characters *, @, and #
//[^\r\n]*[\r\n]	Matches comments starting with //
^dog	Matches if "dog" is at the beginning
dog$	Matches if "dog" is at the end
^dog$	Exactly matches "dog"