Filter spark DataFrame on string contains

Question

I am using Spark 1 3 0 and Spark Avro 1 0 0    I am working from the example on the repository page   This following code works well  val df   sqlContext read avro  src test resources episodes avro   df filter  doctor  gt  5   write avro   tmp output     But what if I needed to see if the doctor string contains a substring  Since we are writing our expression inside of a string  What do I do to do a  contains

User · Accepted Answer

You can use contains  this works with an arbitrary sequence    df filter   foo  contains  bar      like  SQL like with SQL simple regular expression whith   matching an arbitrary character and   matching  an arbitrary sequence    df filter   foo  like  bar      or rlike  like with Java regular expressions    df filter   foo  rlike  bar      depending on your requirements  LIKE and RLIKE should work with SQL expressions as well

User · Answer

In pyspark SparkSql syntax    where column n like  xyz     might not work   Use   where column n RLIKE   xyz     This works perfectly fine

[scala] Filter spark DataFrame on string contains

Examples related to scala

Examples related to apache-spark

Examples related to dataframe

Examples related to apache-spark-sql