When filtering a DataFrame with string values, I find that the pyspark.sql.functions
lower
and upper
come in handy, if your data could have column entries like "foo" and "Foo":
import pyspark.sql.functions as sql_fun
result = source_df.filter(sql_fun.lower(source_df.col_name).contains("foo"))