Scala Regular Expressions
Scala supports regular expressions through the Regex class in the scala.util.matching package. The following example demonstrates using a regular expression to find the word Scala:
Example
import scala.util.matching.Regex
object Test {
def main(args: Array[String]) {
val pattern = "Scala".r
val str = "Scala is Scalable and cool"
println(pattern findFirstIn str)
}
}
Running the above code, the output is:
$ scalac Test.scala
$ scala Test
Some(Scala)
In the example, the r() method of the String class is used to construct a Regex object.
Then the findFirstIn method is used to find the first match.
If you need to see all matches, you can use the findAllIn method.
You can use the mkString( ) method to connect the strings of regular expression matches, and you can use the pipe (|) to set different patterns:
Example
import scala.util.matching.Regex
object Test {
def main(args: Array[String]) {
val pattern = new Regex("(S|s)cala") // The first letter can be uppercase S or lowercase s
val str = "Scala is scalable and cool"
println((pattern findAllIn str).mkString(",")) // Use a comma , to connect the returned results
}
}
Running the above code, the output is:
$ scalac Test.scala
$ scala Test
Scala,scala
If you need to replace the matched text with a specific keyword, you can use the replaceFirstIn( ) method to replace the first match, and use the replaceAllIn( ) method to replace all matches, as shown in the following example:
Example
object Test {
def main(args: Array[String]) {
val pattern = "(S|s)cala".r
val str = "Scala is scalable and cool"
println(pattern replaceFirstIn(str, "Java"))
}
}
Running the above code, the output is:
$ scalac Test.scala
$ scala Test
Java is scalable and cool
Regular Expressions
Scala's regular expressions inherit the syntax rules of Java, which mostly uses the rules of the Perl language.
The following table provides some common regular expression rules:
Expression | Matching Rule | |
---|---|---|
^ | Matches the position at the beginning of the input string. | |
$ | Matches the position at the end of the input string. | |
. | Matches any single character except "\r\n". | |
[...] | Character set. Matches any one of the characters contained. For example, "[abc]" matches "a" in "plain". | |
[^...] | Negated character set. Matches any character not contained. For example, "[^abc]" matches "p", "l", "i", "n" in "plain". | |
\A | Matches the position at the beginning of the input string (no multiline support) | |
\z | String end (similar to $, but not affected by the multiline option) | |
\Z | String end or line end (not affected by the multiline option) | |
re* | Zero or more repetitions | |
re+ | One or more repetitions | |
re? | Zero or one repetition | |
re{ n} | Repeats n times | |
re{ n,} | ||
re{ n, m} | Repeats from n to m times | |
a | b | Matches a or b |
(re) | Matches re, and captures the text into an automatically named group | |
(?: re) | Matches re, does not capture the matched text, nor assign a group number to this group | |
(?> re) | Greedy subexpression | |
\w | Matches a letter or digit or underscore or Chinese character | |
\W | Matches any character that is not a letter, digit, underscore, or Chinese character | |
\s | Matches any whitespace character, equivalent to [\t\n\r\f] | |
\S | Matches any character that is not a whitespace character | |
\d | Matches a digit, similar to [0-9] | |
\D | Matches any character that is not a digit | |
\G | The beginning of the current search | |
\n | Line break | |
\b | Usually a word boundary position, but if used in a character class, it represents a backspace | |
\B | Matches a position that is not the beginning or end of a word | |
\t | Tab character | |
\Q | Quote start |