There is a Spark column/expression API join for such case:
Leaddetails.join(
Utm_Master,
Leaddetails("LeadSource") <=> Utm_Master("LeadSource")
&& Leaddetails("Utm_Source") <=> Utm_Master("Utm_Source")
&& Leaddetails("Utm_Medium") <=> Utm_Master("Utm_Medium")
&& Leaddetails("Utm_Campaign") <=> Utm_Master("Utm_Campaign"),
"left"
)
The <=>
operator in the example means "Equality test that is safe for null values".
The main difference with simple Equality test (===
) is that the first one is safe to use in case one of the columns may have null values.