Regular Expressions - Meta Characters
The table below contains the complete list of meta characters and their behavior in the context of regular expressions:
| Character | Description | |||
|---|---|---|---|---|
| \ | Marks the next character as either a special character, a literal, a back-reference, or an octal escape. For example, 'n' matches the character "n". '\n' matches a newline character. The sequence '\' matches "\" and "(" matches "(". | |||
| ^ | Matches the start of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after '\n' or '\r'. | |||
| $ | Matches the end of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before '\n' or '\r'. | |||
| * | Matches the preceding sub-expression zero or more times. For example, zo* can match "z" and "zoo". * is equivalent to {0,}. | |||
| + | Matches the preceding sub-expression one or more times. For example, 'zo+' can match "zo" and "zoo", but not "z". + is equivalent to {1,}. | |||
| ? | Matches the preceding sub-expression zero or one time. For example, "do(es)?" can match "do" or "does". ? is equivalent to {0,1}. | |||
| {n} | n is a non-negative integer. Matches exactly n times. For example, 'o{2}' does not match "Bob"中的 'o', but matches the two o's in "food". | |||
| {n,} | n is a non-negative integer. Matches at least n times. For example, 'o{2,}' does not match "Bob"中的 'o', but matches all o's in "foooood". 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'. | |||
| {n,m} | m and n are non-negative integers, where n <= m. Matches at least n times and at most m times. For example, "o{1,3}" will match the first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'. Note that there is no space between the comma and the two numbers. | |||
| ? | When this character immediately follows any of the quantifiers (*, +, ?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. A non-greedy pattern matches as few of the searched string as possible, whereas the default greedy pattern matches as many as possible. For example, for the string "oooo", 'o+?' will match a single "o", while 'o+' will match all 'o's. | |||
| . | Matches any single character except newline characters (\n, \r). To match any character including '\n', use the pattern "(. | \n)". | ||
| (pattern) | Matches pattern and captures the match. The captured match can be retrieved from the resulting Matches collection, using SubMatches collection in VBScript, or $0…$9 properties in JScript. To match parentheses characters, use '(' or ')'. | |||
| (?:pattern) | Matches pattern but does not capture the match, i.e., it is a non-capturing match and does not store it for later use. This is useful when using the "or" character ( | ) to combine parts of a pattern. For example, 'industr(?:y | ies) is a more concise expression than 'industry | industries'. | 
| (?=pattern) | Positive lookahead assertion, matches at the beginning of any string that matches pattern. This is a non-capturing match, meaning it does not need to be stored for later use. For example, "Windows(?=95 | 98 | NT | 2000)" can match "Windows" in "Windows2000", but not in "Windows3.1". Lookahead does not consume characters, meaning after a match occurs, the next match search starts immediately after the last match, not from the character containing the lookahead. | 
| (?!pattern) | Negative lookahead assertion, matches at the beginning of any string that does not match pattern. This is a non-capturing match, meaning it does not need to be stored for later use. For example, "Windows(?!95 | 98 | NT | 2000)" can match "Windows" in "Windows3.1", but not in "Windows2000". Lookahead does not consume characters, meaning after a match occurs, the next match search starts immediately after the last match, not from the character containing the lookahead. | 
| (?<=pattern) | Positive lookbehind assertion, similar to positive lookahead but in the opposite direction. For example, " (?<=95 | 98 | NT | 2000)Windows" can match " 2000Windows" in "Windows", but not " 3.1Windows" in "Windows". | 
| (? | Negative lookbehind assertion, similar to negative lookahead but in the opposite direction. For example, " (? | 98 | NT | 2000)Windows" can match " 3.1Windows" in "Windows", but not " 2000Windows" in "Windows". | 
| x | y | Matches either x or y. For example, 'z | food' can match "z" or "food". '(z | f)ood' matches "zood" or "food". | 
| [xyz] | Character set. Matches any one of the enclosed characters. For example, '[abc]' can match 'a' in "plain". | |||
| [^xyz] | Negated character set. Matches any character not enclosed. For example, '[^abc]' can match 'p', 'l', 'i', 'n' in "plain". | |||
| [a-z] | Character range. Matches any character in the specified range. For example, '[a-z]' can match any lowercase letter from 'a' to 'z'. | |||
| [^a-z] | Negated character range. Matches any character not in the specified range. For example, '[^a-z]' can match any character not in the range 'a' to 'z'. | |||
| \b | Matches a word boundary, the position between a word and a space. For example, 'er\b' can match 'er' in "never", but not 'er' in "verb". | |||
| \B | Matches a non-word boundary. 'er\B' can match 'er' in "verb", but not 'er' in "never". | |||
| \cx | Matches the control character indicated by x. For example, \cM matches a Control-M or carriage return. The value of x must be A-Z or a-z. Otherwise, c is treated as a literal 'c' character. | |||
| \d | Matches a digit character. Equivalent to [0-9]. | |||
| \D | Matches a non-digit character. Equivalent to [^0-9]. | |||
| \f | Matches a form feed character. Equivalent to \x0c and \cL. | |||
| \n | Matches a newline character. Equivalent to \x0a and \cJ. | |||
| \r | Matches a carriage return character. Equivalent to \x0d and \cM. | |||
| \s | Matches any whitespace character, including space, tab, form feed, etc. Equivalent to [ \f\n\r\t\v]. | |||
| \S | Matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v]. | |||
| \t | Matches a tab character. Equivalent to \x09 and \cI. | |||
| \v | Matches a vertical tab character. Equivalent to \x0b and \cK. | |||
| \w | Matches any word character (alphanumeric and underscore). Equivalent to '[A-Za-z0-9_]'. | |||
| \W | Matches any non-word character. Equivalent to '[^A-Za-z0-9_]'. | |||
| \xn | Matches n, where n is a hexadecimal escape sequence. The hexadecimal escape sequence must be exactly two digits long. For example, '\x41' matches "A". '\x041' is equivalent to '\x04' & "1". ASCII encoding can be used in regular expressions. | |||
| \num | Matches num, where num is a positive integer. This is a reference to a captured match. For example, '(.)1' matches two consecutive identical characters. | |||
| \n | Indicates an octal escape sequence or a back reference. If \n is preceded by at least n captured sub-expressions, n is a back reference. Otherwise, if n is an octal digit (0-7), it is an octal escape sequence. | |||
| \nm | Indicates an octal escape sequence or a back reference. If \nm is preceded by at least nm captured sub-expressions, nm is a back reference. If \nm is preceded by at least n captures, n is a back reference followed by the literal m. If none of the conditions are met, if n and m are octal digits (0-7), \nm matches the octal escape sequence nm. | |||
| \nml | If n is an octal digit (0-3), and m and l are octal digits (0-7), it matches the octal escape sequence nml. | |||
| \un | Matches n, where n is a Unicode character represented by four hexadecimal digits. For example, \u00A9 matches the copyright symbol (©). | 
Example
Next, we analyze a regular expression for matching email addresses, as shown below:
Example
``` var str = "abcd [email protected] 1234"; var patt1 = /\b[\w.%+-]+@[\w.-]+.[a-zA-Z]{2,6}\b/g; document.write(str.match(patt1));
The following text is the matched expression obtained: