multiple conditions for filter in spark data frames

Question

I have a data frame with four fields  one of the field name is Status and i am trying to use a OR condition in  filter for a dataframe   I tried below queries but no luck   df2   df1 filter   Status 2        Status  3     df2   df1 filter  Status 2      Status  3     Has anyone used this before  I have seen a similar question on stack overflow here   They have used below code for using OR condition  But that code is for pyspark   from pyspark sql functions import col   numeric filtered   df where   col  LOW          null       col  NORMAL       null      col  HIGH         null    numeric filtered show

User · Answer

df2   df1 filter  Status 2         filter  Status 3

User · Answer

df2   df1 filter  Status   2 OR Status   3     Worked for me

User · Answer

In spark scala  it s pretty easy to filter with varargs   val d   spark read     data contains column named matid val ids   Seq  BNBEL0608AH    BNBEL00608H   val filtered   d filter   matid  isin ids

User · Answer

You can try   filtering with 1 object like a list or a set of values   ds   ds filter functions col COL NAME  isin myList      or as  Tony Fraser suggested  you can try   with a Seq of objects   ds   ds filter functions col COL NAME  isin mySeq      All the answers are correct but most of them do not represent a good coding style  Also  you should always consider the variable length of arguments for the future  even though they are static at a certain point in time

User · Answer

In java spark dataset it can be used as   Dataset userfilter   user filter col  gender   isin  male   female

User · Answer

You need to use filter   package dataframe  import org apache spark sql SparkSession         author vaquar khan gmail com         object DataFrameExample         case class Employee id  Integer  name  String  address  String  salary  Double  state  String zip Integer         def main args  Array String         val spark         SparkSession builder            appName  DataFrame-Basic            master  local 4             getOrCreate        import spark implicits           create a sequence of case class objects           we defined the case class above       val emp   Seq       Employee 1   vaquar khan    111 algoinquin road chicago   120000 00   AZ  60173       Employee 2   Firdos Pasha    1300 algoinquin road chicago   2500000 00   IL  50112       Employee 3   Zidan khan    112 apt abcd timesqure NY   50000 00   NY  55490       Employee 4   Anwars khan    washington dc   120000 00   VA  33245       Employee 5   Deepak sharma     rolling edows schumburg   990090 00   IL  60172       Employee 6   afaq khan    saeed colony Bhopal   1000000 00   AZ  60173             val employee spark sparkContext parallelize emp  4  toDF         employee printSchema        employee show         employee select  state    zip   show        println      use filter   to choose rows        employee filter   state  equalTo  IL    show        println      multi contidtion in filer            employee filter   state  equalTo  IL        state  equalTo  AZ    show        println      multi contidtion in filer  amp  amp           employee filter   state  equalTo  AZ    amp  amp    zip  equalTo  60173    show

User · Answer

Another way is to use function expr with where clause  import org apache spark sql functions expr  df2   df1 where expr  col1    value1  and col2    value2       It works the same

User · Answer

For future references    we can use isInCollection to filter  here is a example   Note   It will look for exact match    def getSelectedTablesRows allTablesInfoDF  DataFrame  tableNames  Seq String    DataFrame          allTablesInfoDF where col  table name   isInCollection tableNames

User · Answer

If we want partial match just like contains  we can chain the contain call like this    def getSelectedTablesRows2 allTablesInfoDF  DataFrame  tableNames  Seq String    DataFrame          val tableFilters   tableNames map   toLowerCase    map name   gt  lower col  table name    contains name       val finalFilter   tableFilters fold lit false    accu  newTableFilter    gt  accu or newTableFilter      allTablesInfoDF where finalFilter

User · Answer

This question has been answered but for future reference  I would like to mention that  in the context of this question  the where and filter methods in Dataset Dataframe supports two syntaxes  The SQL string parameters   df2   df1 filter   Status   2 or Status   3      and Col based parameters  mentioned by  David     df2   df1 filter   Status      2      Status      3    It seems the OP d combined these two syntaxes   Personally  I prefer the first syntax because it s cleaner and more generic

User · Answer

Instead of   df2   df1 filter  Status 2      Status  3     Try   df2   df1 filter   Status      2      Status      3

[apache-spark] multiple conditions for filter in spark data frames

Examples related to apache-spark

Examples related to apache-spark-sql

Examples related to spark-dataframe