Broadly speaking, spark Executor JVM memory can be divided into two parts. Spark memory and User memory. This is controlled by property spark.memory.fraction
- the value is between 0 and 1.
When working with images or doing memory intensive processing in spark applications, consider decreasing the spark.memory.fraction
. This will make more memory available to your application work. Spark can spill, so it will still work with less memory share.
The second part of the problem is division of work. If possible, partition your data into smaller chunks. Smaller data possibly needs less memory. But if that is not possible, you are sacrifice compute for memory. Typically a single executor will be running multiple cores. Total memory of executors must be enough to handle memory requirements of all concurrent tasks. If increasing executor memory is not a option, you can decrease the cores per executor so that each task gets more memory to work with. Test with 1 core executors which have largest possible memory you can give and then keep increasing cores until you find the best core count.