Repartition: Shuffle the data into a NEW number of partitions.
Eg. Initial data frame is partitioned in 200 partitions.
df.repartition(500)
: Data will be shuffled from 200 partitions to new 500 partitions.
Coalesce: Shuffle the data into existing number of partitions.
df.coalesce(5)
: Data will be shuffled from remaining 195 partitions to 5 existing partitions.