Regular Expressions - Meta Characters
The table below contains the complete list of meta characters and their behavior in the context of regular expressions:
Character | Description | |||
---|---|---|---|---|
\ | Marks the next character as either a special character, a literal, a back-reference, or an octal escape. For example, 'n' matches the character "n". '\n' matches a newline character. The sequence '\' matches "\" and "(" matches "(". | |||
^ | Matches the start of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after '\n' or '\r'. | |||
$ | Matches the end of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before '\n' or '\r'. | |||
* | Matches the preceding sub-expression zero or more times. For example, zo* can match "z" and "zoo". * is equivalent to {0,}. | |||
+ | Matches the preceding sub-expression one or more times. For example, 'zo+' can match "zo" and "zoo", but not "z". + is equivalent to {1,}. | |||
? | Matches the preceding sub-expression zero or one time. For example, "do(es)?" can match "do" or "does". ? is equivalent to {0,1}. | |||
{n} | n is a non-negative integer. Matches exactly n times. For example, 'o{2}' does not match "Bob"中的 'o', but matches the two o's in "food". | |||
{n,} | n is a non-negative integer. Matches at least n times. For example, 'o{2,}' does not match "Bob"中的 'o', but matches all o's in "foooood". 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'. | |||
{n,m} | m and n are non-negative integers, where n <= m. Matches at least n times and at most m times. For example, "o{1,3}" will match the first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'. Note that there is no space between the comma and the two numbers. | |||
? | When this character immediately follows any of the quantifiers (*, +, ?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. A non-greedy pattern matches as few of the searched string as possible, whereas the default greedy pattern matches as many as possible. For example, for the string "oooo", 'o+?' will match a single "o", while 'o+' will match all 'o's. | |||
. | Matches any single character except newline characters (\n, \r). To match any character including '\n', use the pattern "(. | \n)". | ||
(pattern) | Matches pattern and captures the match. The captured match can be retrieved from the resulting Matches collection, using SubMatches collection in VBScript, or $0…$9 properties in JScript. To match parentheses characters, use '(' or ')'. | |||
(?:pattern) | Matches pattern but does not capture the match, i.e., it is a non-capturing match and does not store it for later use. This is useful when using the "or" character ( | ) to combine parts of a pattern. For example, 'industr(?:y | ies) is a more concise expression than 'industry | industries'. |
(?=pattern) | Positive lookahead assertion, matches at the beginning of any string that matches pattern. This is a non-capturing match, meaning it does not need to be stored for later use. For example, "Windows(?=95 | 98 | NT | 2000)" can match "Windows" in "Windows2000", but not in "Windows3.1". Lookahead does not consume characters, meaning after a match occurs, the next match search starts immediately after the last match, not from the character containing the lookahead. |
(?!pattern) | Negative lookahead assertion, matches at the beginning of any string that does not match pattern. This is a non-capturing match, meaning it does not need to be stored for later use. For example, "Windows(?!95 | 98 | NT | 2000)" can match "Windows" in "Windows3.1", but not in "Windows2000". Lookahead does not consume characters, meaning after a match occurs, the next match search starts immediately after the last match, not from the character containing the lookahead. |
(?<=pattern) | Positive lookbehind assertion, similar to positive lookahead but in the opposite direction. For example, " (?<=95 | 98 | NT | 2000)Windows" can match " 2000Windows" in "Windows", but not " 3.1Windows" in "Windows". |
(? | Negative lookbehind assertion, similar to negative lookahead but in the opposite direction. For example, " (? | 98 | NT | 2000)Windows" can match " 3.1Windows" in "Windows", but not " 2000Windows" in "Windows". |
x | y | Matches either x or y. For example, 'z | food' can match "z" or "food". '(z | f)ood' matches "zood" or "food". |
[xyz] | Character set. Matches any one of the enclosed characters. For example, '[abc]' can match 'a' in "plain". | |||
[^xyz] | Negated character set. Matches any character not enclosed. For example, '[^abc]' can match 'p', 'l', 'i', 'n' in "plain". | |||
[a-z] | Character range. Matches any character in the specified range. For example, '[a-z]' can match any lowercase letter from 'a' to 'z'. | |||
[^a-z] | Negated character range. Matches any character not in the specified range. For example, '[^a-z]' can match any character not in the range 'a' to 'z'. | |||
\b | Matches a word boundary, the position between a word and a space. For example, 'er\b' can match 'er' in "never", but not 'er' in "verb". | |||
\B | Matches a non-word boundary. 'er\B' can match 'er' in "verb", but not 'er' in "never". | |||
\cx | Matches the control character indicated by x. For example, \cM matches a Control-M or carriage return. The value of x must be A-Z or a-z. Otherwise, c is treated as a literal 'c' character. | |||
\d | Matches a digit character. Equivalent to [0-9]. | |||
\D | Matches a non-digit character. Equivalent to [^0-9]. | |||
\f | Matches a form feed character. Equivalent to \x0c and \cL. | |||
\n | Matches a newline character. Equivalent to \x0a and \cJ. | |||
\r | Matches a carriage return character. Equivalent to \x0d and \cM. | |||
\s | Matches any whitespace character, including space, tab, form feed, etc. Equivalent to [ \f\n\r\t\v]. | |||
\S | Matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v]. | |||
\t | Matches a tab character. Equivalent to \x09 and \cI. | |||
\v | Matches a vertical tab character. Equivalent to \x0b and \cK. | |||
\w | Matches any word character (alphanumeric and underscore). Equivalent to '[A-Za-z0-9_]'. | |||
\W | Matches any non-word character. Equivalent to '[^A-Za-z0-9_]'. | |||
\xn | Matches n, where n is a hexadecimal escape sequence. The hexadecimal escape sequence must be exactly two digits long. For example, '\x41' matches "A". '\x041' is equivalent to '\x04' & "1". ASCII encoding can be used in regular expressions. | |||
\num | Matches num, where num is a positive integer. This is a reference to a captured match. For example, '(.)1' matches two consecutive identical characters. | |||
\n | Indicates an octal escape sequence or a back reference. If \n is preceded by at least n captured sub-expressions, n is a back reference. Otherwise, if n is an octal digit (0-7), it is an octal escape sequence. | |||
\nm | Indicates an octal escape sequence or a back reference. If \nm is preceded by at least nm captured sub-expressions, nm is a back reference. If \nm is preceded by at least n captures, n is a back reference followed by the literal m. If none of the conditions are met, if n and m are octal digits (0-7), \nm matches the octal escape sequence nm. | |||
\nml | If n is an octal digit (0-3), and m and l are octal digits (0-7), it matches the octal escape sequence nml. | |||
\un | Matches n, where n is a Unicode character represented by four hexadecimal digits. For example, \u00A9 matches the copyright symbol (©). |
Example
Next, we analyze a regular expression for matching email addresses, as shown below:
Example
``` var str = "abcd [email protected] 1234"; var patt1 = /\b[\w.%+-]+@[\w.-]+.[a-zA-Z]{2,6}\b/g; document.write(str.match(patt1));
The following text is the matched expression obtained: