Easy Tutorial
❮ Csharp Exception Handling Csharp File Io ❯

C# Regular Expressions

Regular Expressions are patterns used to match input text.

The .NET framework provides a regular expression engine that allows such matching.

Patterns consist of one or more characters, operators, and constructs.

If you are not familiar with regular expressions, you can read our Regular Expressions - Tutorial.

Defining Regular Expressions

The following lists various categories of characters, operators, and constructs used to define regular expressions.

Character Escapes

The backslash character (\) in a regular expression indicates that the character following it is either a special character or should be interpreted literally.

The table below lists escape characters:

Escape Character Description Pattern Matches
\a Matches the alert (bell) character \u0007. \a "\u0007" in "Warning!" + '\u0007'
\b In a character class, matches the backspace \u0008. [\b]{3,} "\b\b\b\b" in "\b\b\b\b"
\t Matches the tab character \u0009. (\w+)\t "Name\t" and "Addr\t" in "Name\tAddr\t"
\r Matches the carriage return character \u000D. (\r is not equivalent to newline \n.) \r\n(\w+) "\r\nHello" in "\r\nHello\nWorld."
\v Matches the vertical tab character \u000B. [\v]{2,} "\v\v\v" in "\v\v\v"
\f Matches the form feed character \u000C. [\f]{2,} "\f\f\f" in "\f\f\f"
\n Matches the newline character \u000A. \r\n(\w+) "\r\nHello" in "\r\nHello\nWorld."
\e Matches the escape character \u001B. \e "\x001B" in "\x001B"
\ nnn Specifies a character using octal representation (nnn consists of two to three digits). \w\040\w "a b" and "c d" in "a bc d"
\x nn Specifies a character using hexadecimal representation (nn consists of exactly two digits). \w\x20\w "a b" and "c d" in "a bc d"
\c X<br>\c x Matches the ASCII control character specified by X or x, where X or x is the letter of the control character. \cC "\x0003" in "\x0003" (Ctrl-C)
\u nnnn Matches a Unicode character using hexadecimal representation (nnnn consists of four digits). \w\u0020\w "a b" and "c d" in "a bc d"
\ Matches the character when followed by an unrecognized escape character. \d+[+-x*]\d+\d+[+-x*\d+ "2+2" and "39" in "(2+2) * 39"

Character Classes

Character classes match any one character from a set of characters.

The table below lists character classes:

Character Class Description Pattern Matches
[character_group] Matches any single character in character_group. By default, matching is case-sensitive. [mn] "m" in "mat", "m" and "n" in "moon"
[^character_group] Negation: Matches any single character that is not in character_group. By default, characters in character_group are case-sensitive. [^aei] "v" and "l" in "avail"
[ first - last ] Character range: Matches any single character in the range from first to last. [b-d] [b-d]irds can match Birds, Cirds, Dirds
. Wildcard: Matches any single character except \n. <br>To match a literal period character (.) or \u002E, you must escape it (.). a.e "ave" in "have", "ate" in "mate"
\p{ name } Matches any single character in the Unicode general category or named block specified by name. \p{Lu} "C" and "L" in "City Lights"
\P{name} Matches any single character that is not in the Unicode general category or named block specified by name. \P{Lu} "i", "t", and "y" in "City"
\w Matches any word character. \w "R", "o", "m", and "1" in "Room#1"
\W Matches any non-word character. \W "#" in "Room#1"
\s Matches any whitespace character. \w\s "D " in "ID A1.3"
\S Matches any non-whitespace character. \s\S " _" in "int __ctr"
\d Matches any decimal digit. \d "4" in "4 = IV"
\D Matches any character that is not a decimal digit. \D " ", "=", " ", "I", and "V" in "4 = IV"

Anchors

Anchors or atomic zero-width assertions cause a match to succeed or fail depending on the current position in the string, but they do not cause the engine to advance through the string or consume characters.

The following table lists the anchors:

Assertion Description Pattern Matches
^ The match must start at the beginning of the string or line. ^\d{3} "567" in "567-777-"
$ The match must occur at the end of the string or before \n at the end of the line or string. -\d{4}$ "-2012" in "8-12-2012"
\A The match must occur at the start of the string. \A\w{4} "Code" in "Code-007-"
\Z The match must occur at the end of the string or before \n at the end of the string. -\d{3}\Z "-007" in "Bond-901-007"
\z The match must occur at the end of the string. -\d{3}\z "-333" in "-901-333"
\G The match must occur at the point where the previous match ended. \G(\d) "(1)", "(3)", and "(5)" in "(1)(3)(5)7"
\b Matches a word boundary, which is the position between a word and a space. er\b "er" in "never" but not in "verb"
\B Matches a non-word boundary. er\B "er" in "verb" but not in "never"

Grouping Constructs

Grouping constructs delineate the subexpressions of a regular expression and capture the substrings of an input string.

This part is difficult to understand and can be aided by reading *, * for better understanding.

The following table lists the grouping constructs:

Grouping Construct Description Pattern Matches
(subexpression) Captures the matched subexpression and assigns it a zero-based ordinal number. (\w)\1 "ee" in "deep"
(?<name>subexpression) Captures the matched subexpression into a named group. (?<double>\w)\k<double> "ee" in "deep"
(?<name1-name2>subexpression) Defines a balancing group definition. (((?'Open'()[^()])+((?'Close-Open'))[^()])+)*(?(Open)(?!))$ "((1-3)(3-1))" in "3+2^((1-3)(3-1))"
(?:subexpression) Defines a non-capturing group. Write(?:Line)? "WriteLine" in "Console.WriteLine()"
(?imnsx-imnsx:subexpression) Applies or disables the specified options within subexpression. A\d{2}(?i:\w+)\b "A12xl" and "A12XL" in "A12xl A12XL a12xl"
(?=subexpression) Zero-width positive lookahead assertion. \w+(?=.) "is", "ran", and "out" in "He is. The dog ran. The sun is out."
(?!subexpression) Zero-width negative lookahead assertion. \b(?!un)\w+\b "sure" and "used" in "unsure sure unity used"
(?<=subexpression) Zero-width positive lookbehind assertion. (?<=19)\d{2}\b "99", "50", and "05" in "1851 1999 1950 1905 2003"
(? Zero-width negative lookbehind assertion. (? "man" in "Hi woman Hi man"
(?>subexpression) Non-backtracking (also known as "greedy") subexpression. 13579 "1ABB", "3ABB", and "5AB" in "1ABB 3ABBC 5AB 5AC"

Example

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "1851 1999 1950 1905 2003";
      string pattern = @"(?<=19)\d{2}\b";

      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}

Quantifiers

Quantifiers specify how many instances of the previous element (which can be a character, a group, or a character class) must be present in the input string for a match to occur. Quantifiers include the language elements listed in the table below.

The table lists the quantifiers:

Quantifier Description Pattern Matches
* Matches the previous element zero or more times. \d*.\d ".0", "19.9", "219.9"
+ Matches the previous element one or more times. "be+" "bee" in "been", "be" in "bent"
? Matches the previous element zero or one time. "rai?n" "ran", "rain"
{n} Matches the previous element exactly n times. ",\d{3}" ",043" in "1,043.6", ",876", ",543", and ",210" in "9,876,543,210"
{n,} Matches the previous element at least n times. "\d{2,}" "166", "29", "1930"
{n,m} Matches the previous element at least n times, but no more than m times. "\d{3,5}" "166", "17668", "19302" in "193024"
*? Matches the previous element zero or more times, but as few times as possible. \d*?.\d ".0", "19.9", "219.9"
+? Matches the previous element one or more times, but as few times as possible. "be+?" "be" in "been", "be" in "bent"
?? Matches the previous element zero or one time, but as few times as possible. "rai??n" "ran", "rain"
{n}? Matches the previous element exactly n times. ",\d{3}?" ",043" in "1,043.6", ",876", ",543", and ",210" in "9,876,543,210"
{n,}? Matches the previous element at least n times, but as few times as possible. "\d{2,}?" "166", "29", "1930"
{n,m}? Matches the previous element between n and m times, but as few times as possible. "\d{3,5}?" "166", "17668", "193" and "024" in "193024"

Backreference Constructs

Backreferences allow you to specify previously matched subexpressions in the same regular expression.

The table lists the backreference constructs:

Backreference Construct Description Pattern Matches
\number Backreference. Matches the value of a numbered subexpression. (\w)\1 "ee" in "seek"
\k<name> Named backreference. Matches the value of a named expression. (?<char>\w)\k<char> "ee" in "seek"

Alternation Constructs

Alternation constructs modify the regular expression to enable either/or matching.

The table lists the alternation constructs:

| Alternation Construct | Description | Pattern | Matches | | --- | --- | --- | --- | This is an English translation of the Chinese text.

Alternation

Alternation is a feature in regular expressions that allows matching any one element separated by a vertical bar (|).

Construct Description Example Matches in "this is the day."
th(e|is|at) Matches any one element separated by a vertical bar ( ). th(e|is|at) "the" and "this"
(?(expression)yes|no) If the regular expression pattern specified by expression matches, matches yes; otherwise, matches the optional no part. expression is interpreted as a zero-width assertion. (?(A)A\d{2}\b|\b\d{3}\b) "A10 C103 910" matches "A10" and "910"
(?(name)yes|no) If the named or numbered capturing group name has a match, matches yes; otherwise, matches the optional no. (?<quoted>")?(?(quoted).+?"|\S+\s) "Dogs.jpg "Yiska playing.jpg"" matches "Dogs.jpg" and "Yiska playing.jpg"

Substitution

Substitution is a regular expression used in the replacement pattern.

The following table lists characters used for substitution:

Character Description Pattern Replacement Pattern Input String Result String
$number Replaces the substring matched by group number. \b(\w+)(\s)(\w+)\b $3$2$1 "one two" "two one"
${name} Replaces the substring matched by the named group name. \b(?<word1>\w+)(\s)(?<word2>\w+)\b ${word2} ${word1} "one two" "two one"
$$ Replaces the character "$". \b(\d+)\s?USD $$$1 "103 USD" "$103"
$& Replaces a copy of the entire match. (\$(\d(.+\d+)?){1}) **$& "$1.30" "**$1.30"
$` Replaces all text of the input string before the match. B+ $` "AABBCC" "AAAACC"
$' Replaces all text of the input string after the match. B+ $' "AABBCC" "AACCCC"
$+ Replaces the last captured group. B+(C+) $+ "AABBCCDD" AACCDD
$_ Replaces the entire input string. B+ $_ "AABBCC" "AAAABBCCCC"

Miscellaneous Constructs

The following table lists various miscellaneous constructs:

Construct Description Example
(?imnsx-imnsx) Sets or disables options such as case-insensitive matching within the pattern. \bA(?i)b\w+\b matches "ABA Able Act" for "ABA" and "Able"
(?#comment) Inline comment. The comment terminates at the first right parenthesis. \bA(?#matches words starting with A)\w+\b
#[end of line] The comment starts with an unescaped # and continues to the end of the line. (?x)\bA\w+\b#matches words starting with A

Regex Class

The Regex class is used to represent a regular expression.

The following table lists some commonly used methods in the Regex class:

Number Method & Description
1 public bool IsMatch(string input) <br>Indicates whether the regular expression specified in the Regex constructor finds a match in the specified input string.
2 public bool IsMatch(string input, int startat) <br>Indicates whether the regular expression specified in the Regex constructor finds a match in the specified input string, starting at the specified starting position in the string.
3 public static bool IsMatch(string input, string pattern) <br>Indicates whether the specified regular expression finds a match in the specified input string.
4 public MatchCollection Matches(string input) <br>Searches the specified input string for all occurrences of a regular expression.
5 public string Replace(

string input, string replacement ) Replaces all strings that match a specified regular expression pattern in the input string with the specified replacement string. | 6 | public string[] Split( string input ) Splits the input string into an array of substrings at the positions defined by the regular expression pattern specified in the Regex constructor.

To view the complete list of properties for the Regex class, please refer to Microsoft's C# documentation.

Example 1

The following example matches words that start with 'S':

using System;
using System.Text.RegularExpressions;

namespace RegExApplication
{
    class Program
    {
        private static void showMatch(string text, string expr)
        {
            Console.WriteLine("The Expression: " + expr);
            MatchCollection mc = Regex.Matches(text, expr);
            foreach (Match m in mc)
            {
                Console.WriteLine(m);
            }
        }
        static void Main(string[] args)
        {
            string str = "A Thousand Splendid Suns";

            Console.WriteLine("Matching words that start with 'S': ");
            showMatch(str, @"\bS\S*");
            Console.ReadKey();
        }
    }
}

When the above code is compiled and executed, it produces the following result:

Matching words that start with 'S':
The Expression: \bS\S*
Splendid
Suns

Example 2

The following example matches words that start with 'm' and end with 'e':

using System;
using System.Text.RegularExpressions;

namespace RegExApplication
{
    class Program
    {
        private static void showMatch(string text, string expr)
        {
            Console.WriteLine("The Expression: " + expr);
            MatchCollection mc = Regex.Matches(text, expr);
            foreach (Match m in mc)
            {
                Console.WriteLine(m);
            }
        }
        static void Main(string[] args)
        {
            string str = "make maze and manage to measure it";

            Console.WriteLine("Matching words start with 'm' and ends with 'e':");
            showMatch(str, @"\bm\S*e\b");
            Console.ReadKey();
        }
    }
}

When the above code is compiled and executed, it produces the following result:

Matching words start with 'm' and ends with 'e':
The Expression: \bm\S*e\b
make
maze
manage
measure

Example 3

The following example replaces extra spaces:

using System;
using System.Text.RegularExpressions;

namespace RegExApplication
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = "The    quick   brown   fox";
            string pattern = "\\s+";
            string replacement = " ";
            Regex rgx = new Regex(pattern);
            string result = rgx.Replace(input, replacement);

            Console.WriteLine("Original String: {0}", input);
            Console.WriteLine("Replacement String: {0}", result);
            Console.ReadKey();
        }
    }
}

When the above code is compiled and executed, it produces the following result:

Original String: The    quick   brown   fox
Replacement String: The quick brown fox
string input = "Hello   World   ";
string pattern = "\\s+";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);

Console.WriteLine("Original String: {0}", input);
Console.WriteLine("Replacement String: {0}", result);
Console.ReadKey();
}
}

When the above code is compiled and executed, it produces the following result:

Original String: Hello   World   
Replacement String: Hello World
❮ Csharp Exception Handling Csharp File Io ❯