If you want to do something to each row in a DataFrame object, use map
. This will allow you to perform further calculations on each row. It's the equivalent of looping across the entire dataset from 0
to len(dataset)-1
.
Note that this will return a PipelinedRDD, not a DataFrame.