You can also write like below (without pyspark.sql.functions
):
df.filter('d<5 and (col1 <> col3 or (col1 = col3 and col2 <> col4))').show()
Result:
+----+----+----+----+---+
|col1|col2|col3|col4| d|
+----+----+----+----+---+
| A| xx| D| vv| 4|
| A| x| A| xx| 3|
| E| xxx| B| vv| 3|
| F|xxxx| F| vvv| 4|
| G| xxx| G| xx| 4|
+----+----+----+----+---+