Regular Expressions - `Examples`

Simple Expressions

The simplest form of a regular expression is a single ordinary character that matches itself in the search string. For example, a single-character pattern like A will always match the letter A wherever it appears in the search string. Here are some examples of single-character regular expression patterns:

/a/
/7/
/M/

Multiple single-character patterns can be combined to form larger expressions. For example, the following regular expression combines the single-character expressions: a, 7, and M.

/a7M/

Note that there is no concatenation operator. Simply type one character followed by another.

Character Matching

The dot . matches various printable or non-printable characters in the string, except for newline characters \n and \r. The following regular expression matches aac, abc, acc, adc, and so on, as well as a1c, a2c, a-c, and a#c:

/a.c/

To match a string containing a filename where the period . is part of the input string, precede the period in the regular expression with a backslash \ character. For example, the following regular expression matches filename.ext:

/filename\.ext/

These expressions only allow you to match "any" single character. You may need to match specific groups of characters from a list. For example, you might need to find chapter titles represented by numbers (Chapter 1, Chapter 2, etc.).

A Reasonable Username Regular Expression

A username can include the following characters:

1. 26 uppercase and lowercase English letters represented as a-zA-Z.
1. Digits represented as 0-9.
1. Underscore represented as _.
1. Hyphen represented as -.

A username consists of several letters, digits, underscores, and hyphens, so the + symbol is used to indicate one or more occurrences.

Based on the above conditions, the expression for a username can be:

[a-zA-Z0-9_-]+

Example

var str = "abc123-_def";
var patt = /[a-zA-Z0-9_-]+/;
document.write(str.match(patt));

The following marked text is the matched expression: abc123-_def

If the hyphen is not needed, then:

[a-zA-Z0-9_]+

Example

var str = "abc123def";
var str2 = "abc123_def";
var patt = /[a-zA-Z0-9_]+/;
document.write(str.match(patt));
document.write(str2.match(patt));

The following marked text is the matched expression: abc123def abc123_def

Matching HTML Tags and Content

The following regular expression is used to match an iframe tag:

/&lt;iframe(([\s\S])*?)<\/iframe>/

Other tags can be matched by replacing iframe.

To match a div tag with id="mydiv":

/&lt;div id="mydiv"(([\s\S])*?)<\/div>/

To match all img tags:

Example

/&lt;img.*?src="(.*?)".*?\/?>/gi

Bracket Expressions

To create a list of characters to match, place one or more single characters inside square brackets [ ]. When characters are enclosed in brackets, this list is called a "bracket expression". As with any other position, ordinary characters inside brackets represent themselves, i.e., they match themselves once in the input text. Most special characters lose their meaning when they appear inside bracket expressions. However, there are some exceptions, such as:

If the ] character is not the first item, it ends a list. To match a ] character in the list, place it first, immediately after the opening [.
The \ character continues to act as an escape character. To match a \ character, use \\.

Characters enclosed in a bracket expression only match a single character in the position in the regular expression. The following regular expression matches Chapter 1, Chapter 2, Chapter 3, Chapter 4, and Chapter 5:

/Chapter [12345]/

Note that the positions of the word Chapter and the space are fixed relative to the characters inside the brackets. The bracket expression specifies only the set of characters that match the single character position following the word Chapter and the space. This is the ninth character position.

To use a range instead of individual characters to represent the matching character group, use a hyphen - to separate the starting and ending characters of the range. The character value of a single character determines the relative order within the range. The following regular expression includes a range expression that is equivalent to the list shown above in the brackets.

/Chapter [1-5]/

When specifying a range in this way, both the start and end values are included in the range. It is important to note that, according to the Unicode sorting order, the start value must precede the end value.

To include a hyphen in a bracket expression, use one of the following methods:

Escape it with a backslash:
```
[\-]
```
Place the hyphen at the beginning or end of a bracketed list. The following expressions match all lowercase letters and hyphens:
```
[-a-z]
[a-z-]
```
Create a range where the starting character value is less than the hyphen, and the ending character value is equal to or greater than the hyphen. The following two regular expressions meet this requirement:
```
[!--]
[!-~]
```

To find all characters not in the list or range, place the caret ^ at the beginning of the list. If the caret appears at any other position in the list, it matches itself. The following regular expression matches any digit or character other than 1, 2, 3, 4, or 5:

/Chapter [^12345]/

In the above example, the expression matches any digit or character other than 1, 2, 3, 4, or 5 at the ninth position. Thus, for example, "Chapter 7" is a match, and so is "Chapter 9".

The above expression can be represented with a hyphen -:

/Chapter [^1-5]/

Bracketed expressions are typically used to specify matches for any uppercase or lowercase letter or any digit. The following expression specifies such a match:

/[A-Za-z0-9]/

Substitution and Grouping

Substitution uses the | character to allow a choice between two or more alternative options. For example, you can extend the chapter title regular expression to return a broader range of matches. However, this is not as simple as you might think. Substitution matches the largest expression on either side of the | character.

You might think that the following expression matches "Chapter" at the beginning of a line and "Section" at the end of a line followed by one or two digits:

/^Chapter|Section [1-9][0-9]{0,1}$/

Unfortunately, the above regular expression either matches the word "Chapter" at the beginning of a line or the word "Section" at the end of a line followed by any digits. If the input string is "Chapter 22", the expression only matches "Chapter". If the input string is "Section 22", it matches "Section 22".

To make the regular expression more controllable, you can use parentheses to limit the scope of the substitution, ensuring it only applies to the words "Chapter" and "Section". However, parentheses also create subexpressions and may capture them for later use, which is discussed in the section on backreferences. By adding parentheses in the appropriate places in the above regular expression, you can make it match "Chapter 1" or "Section 3".

The following regular expression uses parentheses to combine "Chapter" and "Section" so that the expression works correctly:

/^(Chapter|Section) [1-9][0-9]{0,1}$/

Although these expressions work normally, the parentheses around "Chapter|Section" also capture either of the two matches for later use. Since there is only one set of parentheses in the above expression, there is only one captured "submatch".

In the above example, you only need to use parentheses to combine the choice between the words "Chapter" and "Section". To prevent matches from being saved for later use, place ?: before the regular expression pattern inside the parentheses. The following modification provides the same capability without saving the submatch:

/^(?:Chapter|Section) [1-9][0-9]{0,1}$/

In addition to the ?: metacharacter, two other non-capturing metacharacters create what are called "lookahead" matches. Positive lookahead is specified with ?= and matches the search string at the starting point of the regular expression pattern in parentheses. Negative lookahead is specified with ?! and matches the search string at the starting point of a string that does not match the regular expression pattern.

For example, suppose you have a document that contains references to Windows 3.1, Windows 95, Windows 98, and Windows NT. Further assume that you need to update the document to change all references to Windows 95, Windows 98, and Windows NT to Windows 2000. The following regular expression (a positive lookahead example) matches Windows 95, Windows 98, and Windows NT:

/Windows(?=95 |98 |NT )/

After finding a match, the search continues immediately after the matched text (excluding the lookahead characters) for the next match. For example, if the above expression matches "Windows 98", the search will continue after "Windows" and not after "98".

Other Examples

Here are some examples of regular expressions:

Regular Expression	Description
/\b([a-z]+) \1\b/gi	A position where a word appears consecutively.
/(\w+):\/\/([^/:]+)(:\d)?([^# ])/	Matches a URL parsed into protocol, domain, port, and relative path.
/^(?:Chapter	Section) [1-9][0-9]{0,1}$/	Locates the position of a chapter.
/[-a-z]/	The 26 letters from a to z plus the hyphen.
/ter\b/	Matches "chapter" but not "terminal".
/\Bapt/	Matches "chapter" but not "aptitude".
/Windows(?=95	98	NT )/	Matches "Windows95" or "Windows98" or "WindowsNT", and starts the next search match after "Windows".
/^\s*$/	Matches empty lines.
/\d{2}-\d{5}/	Validates an ID number consisting of two digits, a hyphen, and then five digits.
<[a-zA-Z]+.?>([\s\S]?)</[a-zA-Z]*?>	Matches HTML tags.

Regular Expression	Description
hello	Matches {hello}
gray	grey	Matches {gray, grey}
gr(a	e)y	Matches {gray, grey}
gr[ae]y	Matches {gray, grey}
b[aeiou]bble	Matches {babble, bebble, bibble, bobble, bubble}
[b-chm-pP]at	ot	Matches {bat, cat, hat, mat, nat, oat, pat, Pat, ot}
colou?r	Matches {color, colour}
rege(x(es)?	xps?)	Matches {regex, regexes, regexp, regexps}
go*gle	Matches {ggle, gogle, google, gooogle, goooogle, ...}
go+gle	Matches {gogle, google, gooogle, goooogle, ...}
g(oog)+le	Matches {google, googoogle, googoogoogle, googoogoogoogle, ...}
z{3}	Matches {zzz}
z{3,6}	Matches {zzz, zzzz, zzzzz, zzzzzz}
z{3,}	Matches {zzz, zzzz, zzzzz, ...}
[Bb]rainf**k	Matches {Brainfk, brainfk}
\d	Matches {0,1,2,3,4,5,6,7,8,9}
1\d{10}	Matches 11 digits starting with 1
[2-9]	[12]\d	3[0-6]	Matches integers from 2 to 36
Hello\nworld	Matches "Hello" followed by a newline, followed by "world"
\d+(.\d\d)?	Includes a positive integer or a floating-point number with two decimal places.
[^*@#]	Excludes the special characters *, @, and #
//[^\r\n]*[\r\n]	Matches comments starting with //
^dog	Matches if "dog" is at the beginning
dog$	Matches if "dog" is at the end
^dog$	Exactly matches "dog"