[apache-spark] Spark - repartition() vs coalesce()

Repartition: Shuffle the data into a NEW number of partitions.

Eg. Initial data frame is partitioned in 200 partitions.

df.repartition(500): Data will be shuffled from 200 partitions to new 500 partitions.

Coalesce: Shuffle the data into existing number of partitions.

df.coalesce(5): Data will be shuffled from remaining 195 partitions to 5 existing partitions.