Regular Expressions - Meta Characters
The table below contains the complete list of meta characters and their behavior in the context of regular expressions:
Character | Description | |||
---|---|---|---|---|
\ | Marks the next character as either a special character, a literal, a backreference, or an octal escape. For example, 'n' matches the character "n". '\n' matches a newline character. The sequence '\' matches "\" and "(" matches "(". | |||
^ | Matches the start of the input string. If the Multiline property of the RegExp object is set, ^ also matches positions after '\n' or '\r'. | |||
$ | Matches the end of the input string. If the Multiline property of the RegExp object is set, $ also matches positions before '\n' or '\r'. | |||
* | Matches the preceding subexpression zero or more times. For example, zo* can match "z" and "zoo". * is equivalent to {0,}. | |||
+ | Matches the preceding subexpression one or more times. For example, 'zo+' can match "zo" and "zoo", but not "z". + is equivalent to {1,}. | |||
? | Matches the preceding subexpression zero or one time. For example, "do(es)?" can match "do" or "does". ? is equivalent to {0,1}. | |||
{n} | n is a non-negative integer. Matches exactly n times. For example, 'o{2}' cannot match "Bob" but can match "food". | |||
{n,} | n is a non-negative integer. Matches at least n times. For example, 'o{2,}' cannot match "Bob" but can match "foooood". 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'. | |||
{n,m} | m and n are non-negative integers, where n <= m. Matches at least n times and at most m times. For example, "o{1,3}" will match the first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'. Note that there must be no spaces between the comma and the numbers. | |||
? | When this character is immediately followed by any other quantifier (*, +, ?, {n}, {n,}, {n,m}), the matching mode is non-greedy. The non-greedy mode matches as few of the searched string as possible, while the default greedy mode matches as many as possible. For example, for the string "oooo", 'o+?' will match a single "o", while 'o+' will match all 'o's. | |||
. | Matches any single character except newline characters (\n, \r). To match any character including '\n', use the pattern "(. | \n)". | ||
(pattern) | Matches pattern and captures the match. The captured match can be retrieved from the Matches collection generated. In VBScript, use the SubMatches collection, and in JScript, use the $0…$9 properties. To match parentheses, use '(' or ')'. | |||
(?:pattern) | Matches pattern but does not capture the match, i.e., it is a non-capturing match and does not store it for future use. This is useful when combining parts of a pattern with the "or" character ( | ). For example, 'industr(?:y | ies)' is a more concise expression than 'industry | industries'. |
(?=pattern) | Positive lookahead assertion, matches at the beginning of any string that matches pattern. This is a non-capturing match, meaning it does not need to be stored for later use. For example, "Windows(?=95 | 98 | NT | 2000)" will match "Windows" in "Windows2000" but not in "Windows3.1". Lookahead does not consume characters, meaning after a match occurs, the next match search begins immediately after the last match, not after the characters within the lookahead. |
(?!pattern) | Negative lookahead assertion, matches at the beginning of any string that does not match pattern. This is a non-capturing match, meaning it does not need to be stored for later use. For example, "Windows(?!95 | 98 | NT | 2000)" will match "Windows" in "Windows3.1" but not in "Windows2000". Lookahead does not consume characters, meaning after a match occurs, the next match search begins immediately after the last match, not after the characters within the lookahead. |
(?<=pattern) | Positive lookbehind assertion, similar to positive lookahead but in the opposite direction. For example, " (?<=95 | 98 | NT | 2000)Windows" can match " 2000Windows" in "Windows", but not " 3.1Windows" in "Windows". |
(? | Negative lookbehind assertion, similar to negative lookahead but in the opposite direction. For example, " (? | 98 | NT | 2000)Windows" can match " 3.1Windows" in "Windows", but not " 2000Windows" in "Windows". |
x | y | Matches either x or y. For example, 'z | food' can match "z" or "food". '(z | f)ood' matches "zood" or "food". |
[xyz] | Character set. Matches any one of the enclosed characters. For example, '[abc]' can match 'a' in "plain". | |||
[^xyz] | Negated character set. Matches any character not enclosed. For example, '[^abc]' can match 'p', 'l', 'i', 'n' in "plain". | |||
[a-z] | Character range. Matches any character in the specified range. For example, '[a-z]' can match any lowercase letter from 'a' to 'z'. | |||
[^a-z] | Negated character range. Matches any character not in the specified range. For example, '[^a-z]' can match any character not in the range 'a' to 'z'. | |||
\b | Matches a word boundary, the position between a word and a space. For example, 'er\b' can match 'er' in "never", but not 'er' in "verb". | |||
\B | Matches a non-word boundary. 'er\B' can match 'er' in "verb", but not 'er' in "never". | |||
\cx | Matches the control character specified by x. For example, \cM matches a Control-M or carriage return. x must be in the range A-Z or a-z. Otherwise, c is treated as a literal 'c' character. | |||
\d | Matches a digit character. Equivalent to [0-9]. | |||
\D | Matches a non-digit character. Equivalent to [^0-9]. | |||
\f | Matches a form feed character. Equivalent to \x0c and \cL. | |||
\n | Matches a newline character. Equivalent to \x0a and \cJ. | |||
\r | Matches a carriage return character. Equivalent to \x0d and \cM. | |||
\s | Matches any whitespace character, including spaces, tabs, form feeds, etc. Equivalent to [ \f\n\r\t\v]. | |||
\S | Matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v]. | |||
\t | Matches a tab character. Equivalent to \x09 and \cI. | |||
\v | Matches a vertical tab character. Equivalent to \x0b and \cK. | |||
\w | Matches any word character (alphanumeric and underscore). Equivalent to '[A-Za-z0-9_]'. | |||
\W | Matches any non-word character. Equivalent to '[^A-Za-z0-9_]'. | |||
\xn | Matches n, where n is a hexadecimal escape sequence. The hexadecimal escape sequence must be exactly two digits long. For example, '\x41' matches "A". '\x041' is equivalent to '\x04' & "1". ASCII encoding can be used in regular expressions. | |||
\num | Matches num, where num is a positive integer. This is a reference to a previously matched group. For example, '(.)1' matches two consecutive identical characters. | |||
\n | Indicates an octal escape sequence or a backreference. If \n is preceded by at least n captured sub-expressions, it is a backreference. Otherwise, if n is an octal digit (0-7), it is an octal escape sequence. | |||
\nm | Indicates an octal escape sequence or a backreference. If \nm is preceded by at least nm captured sub-expressions, it is a backreference. If \nm is preceded by at least n captured sub-expressions, it is a backreference followed by the literal m. If none of these conditions are met, and n and m are octal digits (0-7), \nm matches the octal escape sequence nm. | |||
\nml | If n is an octal digit (0-3) and m and l are octal digits (0-7), it matches the octal escape sequence nml. | |||
\un | Matches n, where n is a Unicode character expressed as four hexadecimal digits. For example, \u00A9 matches the copyright symbol (©). |
Example
Next, we analyze a regular expression for matching email addresses, as shown below:
Example
``` var str = "abcd [email protected] 1234"; var patt1 = /\b[\w.%+-]+@[\w.-]+.[a-zA-Z]{2,6}\b/g; document.write(str.match(patt1));
The following marked text is the expression that matches: