[hadoop] Container is running beyond memory limits

In Hadoop v1, I have assigned each 7 mapper and reducer slot with size of 1GB, my mappers & reducers runs fine. My machine has 8G memory, 8 processor. Now with YARN, when run the same application on the same machine, I got container error. By default, I have this settings:

  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1024</value>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>8192</value>
  </property>
  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>8192</value>
  </property>

It gave me error:

Container [pid=28920,containerID=container_1389136889967_0001_01_000121] is running beyond virtual memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

I then tried to set memory limit in mapred-site.xml:

  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>4096</value>
  </property>
  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>4096</value>
  </property>

But still getting error:

Container [pid=26783,containerID=container_1389136889967_0009_01_000002] is running beyond physical memory limits. Current usage: 4.2 GB of 4 GB physical memory used; 5.2 GB of 8.4 GB virtual memory used. Killing container.

I'm confused why the the map task need this much memory. In my understanding, 1GB of memory is enough for my map/reduce task. Why as I assign more memory to container, the task use more? Is it because each task gets more splits? I feel it's more efficient to decrease the size of container a little bit and create more containers, so that more tasks are running in parallel. The problem is how can I make sure each container won't be assigned more splits than it can handle?

This question is related to hadoop mapreduce yarn mrv2

The answer is


While working with spark in EMR I was having the same problem and setting maximizeResourceAllocation=true did the trick; hope it helps someone. You have to set it when you create the cluster. From the EMR docs:

aws emr create-cluster --release-label emr-5.4.0 --applications Name=Spark \
--instance-type m3.xlarge --instance-count 2 --service-role EMR_DefaultRole --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole --configurations https://s3.amazonaws.com/mybucket/myfolder/myConfig.json

Where myConfig.json should say:

[
  {
    "Classification": "spark",
    "Properties": {
      "maximizeResourceAllocation": "true"
    }
  }
]

There is a check placed at Yarn level for Virtual and Physical memory usage ratio. Issue is not only that VM doesn't have sufficient physical memory. But it is because Virtual memory usage is more than expected for given physical memory.

Note : This is happening on Centos/RHEL 6 due to its aggressive allocation of virtual memory.

It can be resolved either by :

  1. Disable virtual memory usage check by setting yarn.nodemanager.vmem-check-enabled to false;

  2. Increase VM:PM ratio by setting yarn.nodemanager.vmem-pmem-ratio to some higher value.

References :

https://issues.apache.org/jira/browse/HADOOP-11364

http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/

Add following property in yarn-site.xml

 <property>
   <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
    <description>Whether virtual memory limits will be enforced for containers</description>
  </property>
 <property>
   <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>4</value>
    <description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
  </property>

We also faced this issue recently. If the issue is related to mapper memory, couple of things I would like to suggest that needs to be checked are.

  • Check if combiner is enabled or not? If yes, then it means that reduce logic has to be run on all the records (output of mapper). This happens in memory. Based on your application you need to check if enabling combiner helps or not. Trade off is between the network transfer bytes and time taken/memory/CPU for the reduce logic on 'X' number of records.
    • If you feel that combiner is not much of value, just disable it.
    • If you need combiner and 'X' is a huge number (say millions of records) then considering changing your split logic (For default input formats use less block size, normally 1 block size = 1 split) to map less number of records to a single mapper.
  • Number of records getting processed in a single mapper. Remember that all these records need to be sorted in memory (output of mapper is sorted). Consider setting mapreduce.task.io.sort.mb (default is 200MB) to a higher value if needed. mapred-configs.xml
  • If any of the above didn't help, try to run the mapper logic as a standalone application and profile the application using a Profiler (like JProfiler) and see where the memory getting used. This can give you very good insights.

I haven't personally checked, but hadoop-yarn-container-virtual-memory-understanding-and-solving-container-is-running-beyond-virtual-memory-limits-errors sounds very reasonable

I solved the issue by changing yarn.nodemanager.vmem-pmem-ratio to a higher value , and I would agree that:

Another less recommended solution is to disable the virtual memory check by setting yarn.nodemanager.vmem-check-enabled to false.


I can't comment on the accepted answer, due to low reputation. However, I would like to add, this behavior is by design. The NodeManager is killing your container. It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task. The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce.map.memory.mb or mapreduce.reduce.memory.mb respectively, we would expect the Nodemanager to kill the task, otherwise your task is stealing memory belonging to other containers, which you don't want.


Running yarn on Windows Linux subsystem with Ubunto OS, error "running beyond virtual memory limits, Killing container" I resolved it by disabling virtual memory check in the file yarn-site.xml

<property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> 

I had a really similar issue using HIVE in EMR. None of the extant solutions worked for me -- ie, none of the mapreduce configurations worked for me; and neither did setting yarn.nodemanager.vmem-check-enabled to false.

However, what ended up working was setting tez.am.resource.memory.mb, for example:

hive -hiveconf tez.am.resource.memory.mb=4096

Another setting to consider tweaking is yarn.app.mapreduce.am.resource.mb


Examples related to hadoop

Hadoop MapReduce: Strange Result when Storing Previous Value in Memory in a Reduce Class (Java) What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check Spark Version What are the pros and cons of parquet format compared to other formats? java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient How to export data from Spark SQL to CSV How to copy data from one HDFS to another HDFS? How to calculate Date difference in Hive Select top 2 rows in Hive Spark - load CSV file as DataFrame?

Examples related to mapreduce

Hadoop MapReduce: Strange Result when Storing Previous Value in Memory in a Reduce Class (Java) Java8: HashMap<X, Y> to HashMap<X, Z> using Stream / Map-Reduce / Collector What is the purpose of shuffling and sorting phase in the reducer in Map Reduce Programming? Container is running beyond memory limits Hive ParseException - cannot recognize input near 'end' 'string' Count lines in large files Good MapReduce examples What is Hive: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask Setting the number of map tasks and reduce tasks Map and Reduce in .NET

Examples related to yarn

Spark Kill Running Application Apache Spark: The number of cores vs. the number of executors Container is running beyond memory limits

Examples related to mrv2

Container is running beyond memory limits