Current answers are perfectly correct on how to create the aggregations, but none actually address the column alias/renaming that is also requested in the question.
Typically, this is how I handle this case:
val dimensionFields = List("col1")
val metrics = List("col2", "col3", "col4")
val columnOfInterests = dimensions ++ metrics
val df = spark.read.table("some_table").
.select(columnOfInterests.map(c => col(c)):_*)
.groupBy(dimensions.map(d => col(d)): _*)
.agg(metrics.map( m => m -> "sum").toMap)
.toDF(columnOfInterests:_*) // that's the interesting part
The last line essentially renames every columns of the aggregated dataframe to the original fields, essentially changing sum(col2)
and sum(col3)
to simply col2
and col3
.