What's the difference between an RDD's map and mapPartitions method?
The method map converts each element of the source RDD into a single element of the result RDD by applying a function. mapPartitions converts each partition of the source RDD into multiple elements of the result (possibly none).
And does flatMap behave like map or like mapPartitions?
Neither, flatMap works on a single element (as map
) and produces multiple elements of the result (as mapPartitions
).