Map :
- It processes one row at a time , very similar to map() method of MapReduce.
- You return from the transformation after every row.
MapPartitions
- It processes the complete partition in one go.
- You can return from the function only once after processing the whole partition.
- All intermediate results needs to be held in memory till you process the whole partition.
- Provides you like setup() map() and cleanup() function of MapReduce
Map Vs mapPartitions
http://bytepadding.com/big-data/spark/spark-map-vs-mappartitions/
Spark Map
http://bytepadding.com/big-data/spark/spark-map/
Spark mapPartitions
http://bytepadding.com/big-data/spark/spark-mappartitions/