I have few suggession for the above mentioned error.
? Check executor memory assigned as an executor might have to deal with partitions requiring more memory than what is assigned.
? Try to see if more shuffles are live as shuffles are expensive operations since they involve disk I/O, data serialization, and network I/O
? Use Broadcast Joins
? Avoid using groupByKeys and try to replace with ReduceByKey
? Avoid using huge Java Objects wherever shuffling happens