DataFrames
are based on RDDs. RDDs are immutable structures and do not allow updating elements on-site. To change values, you will need to create a new DataFrame by transforming the original one either using the SQL-like DSL or RDD operations like map
.
A highly recommended slide deck: Introducing DataFrames in Spark for Large Scale Data Science.