[regex] How to pattern match using regular expression in Scala?

I would like to be able to find a match between the first letter of a word, and one of the letters in a group such as "ABC". In pseudocode, this might look something like:

case Process(word) =>
   word.firstLetter match {
      case([a-c][A-C]) =>
      case _ =>
   }
}

But how do I grab the first letter in Scala instead of Java? How do I express the regular expression properly? Is it possible to do this within a case class?

This question is related to regex scala

The answer is


Since version 2.10, one can use Scala's string interpolation feature:

implicit class RegexOps(sc: StringContext) {
  def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}

scala> "123" match { case r"\d+" => true case _ => false }
res34: Boolean = true

Even better one can bind regular expression groups:

scala> "123" match { case r"(\d+)$d" => d.toInt case _ => 0 }
res36: Int = 123

scala> "10+15" match { case r"(\d\d)${first}\+(\d\d)${second}" => first.toInt+second.toInt case _ => 0 }
res38: Int = 25

It is also possible to set more detailed binding mechanisms:

scala> object Doubler { def unapply(s: String) = Some(s.toInt*2) }
defined module Doubler

scala> "10" match { case r"(\d\d)${Doubler(d)}" => d case _ => 0 }
res40: Int = 20

scala> object isPositive { def unapply(s: String) = s.toInt >= 0 }
defined module isPositive

scala> "10" match { case r"(\d\d)${d @ isPositive()}" => d.toInt case _ => 0 }
res56: Int = 10

An impressive example on what's possible with Dynamic is shown in the blog post Introduction to Type Dynamic:

object T {

  class RegexpExtractor(params: List[String]) {
    def unapplySeq(str: String) =
      params.headOption flatMap (_.r unapplySeq str)
  }

  class StartsWithExtractor(params: List[String]) {
    def unapply(str: String) =
      params.headOption filter (str startsWith _) map (_ => str)
  }

  class MapExtractor(keys: List[String]) {
    def unapplySeq[T](map: Map[String, T]) =
      Some(keys.map(map get _))
  }

  import scala.language.dynamics

  class ExtractorParams(params: List[String]) extends Dynamic {
    val Map = new MapExtractor(params)
    val StartsWith = new StartsWithExtractor(params)
    val Regexp = new RegexpExtractor(params)

    def selectDynamic(name: String) =
      new ExtractorParams(params :+ name)
  }

  object p extends ExtractorParams(Nil)

  Map("firstName" -> "John", "lastName" -> "Doe") match {
    case p.firstName.lastName.Map(
          Some(p.Jo.StartsWith(fn)),
          Some(p.`.*(\\w)$`.Regexp(lastChar))) =>
      println(s"Match! $fn ...$lastChar")
    case _ => println("nope")
  }
}

To expand a little on Andrew's answer: The fact that regular expressions define extractors can be used to decompose the substrings matched by the regex very nicely using Scala's pattern matching, e.g.:

val Process = """([a-cA-C])([^\s]+)""".r // define first, rest is non-space
for (p <- Process findAllIn "aha bah Cah dah") p match {
  case Process("b", _) => println("first: 'a', some rest")
  case Process(_, rest) => println("some first, rest: " + rest)
  // etc.
}

First we should know that regular expression can separately be used. Here is an example:

import scala.util.matching.Regex
val pattern = "Scala".r // <=> val pattern = new Regex("Scala")
val str = "Scala is very cool"
val result = pattern findFirstIn str
result match {
  case Some(v) => println(v)
  case _ =>
} // output: Scala

Second we should notice that combining regular expression with pattern matching would be very powerful. Here is a simple example.

val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-11-20" match {
  case date(year, month, day) => "hello"
} // output: hello

In fact, regular expression itself is already very powerful; the only thing we need to do is to make it more powerful by Scala. Here are more examples in Scala Document: http://www.scala-lang.org/files/archive/api/current/index.html#scala.util.matching.Regex


Note that the approach from @AndrewMyers's answer matches the entire string to the regular expression, with the effect of anchoring the regular expression at both ends of the string using ^ and $. Example:

scala> val MY_RE = "(foo|bar).*".r
MY_RE: scala.util.matching.Regex = (foo|bar).*

scala> val result = "foo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = foo

scala> val result = "baz123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match

scala> val result = "abcfoo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match

And with no .* at the end:

scala> val MY_RE2 = "(foo|bar)".r
MY_RE2: scala.util.matching.Regex = (foo|bar)

scala> val result = "foo123" match { case MY_RE2(m) => m; case _ => "No match" }
result: String = No match

String.matches is the way to do pattern matching in the regex sense.

But as a handy aside, word.firstLetter in real Scala code looks like:

word(0)

Scala treats Strings as a sequence of Char's, so if for some reason you wanted to explicitly get the first character of the String and match it, you could use something like this:

"Cat"(0).toString.matches("[a-cA-C]")
res10: Boolean = true

I'm not proposing this as the general way to do regex pattern matching, but it's in line with your proposed approach to first find the first character of a String and then match it against a regex.

EDIT: To be clear, the way I would do this is, as others have said:

"Cat".matches("^[a-cA-C].*")
res14: Boolean = true

Just wanted to show an example as close as possible to your initial pseudocode. Cheers!


As delnan pointed out, the match keyword in Scala has nothing to do with regexes. To find out whether a string matches a regex, you can use the String.matches method. To find out whether a string starts with an a, b or c in lower or upper case, the regex would look like this:

word.matches("[a-cA-C].*")

You can read this regex as "one of the characters a, b, c, A, B or C followed by anything" (. means "any character" and * means "zero or more times", so ".*" is any string).