Container is running beyond memory limits

Question

In Hadoop v1  I have assigned each 7 mapper and reducer slot with size of 1GB  my mappers  amp  reducers runs fine  My machine has 8G memory  8 processor   Now with YARN  when run the same application on the same machine  I got container error   By default  I have this settings      lt property gt       lt name gt yarn scheduler minimum-allocation-mb lt  name gt       lt value gt 1024 lt  value gt     lt  property gt     lt property gt       lt name gt yarn scheduler maximum-allocation-mb lt  name gt       lt value gt 8192 lt  value gt     lt  property gt     lt property gt       lt name gt yarn nodemanager resource memory-mb lt  name gt       lt value gt 8192 lt  value gt     lt  property gt    It gave me error   Container  pid 28920 containerID container 1389136889967 0001 01 000121  is running beyond virtual memory limits  Current usage  1 2 GB of 1 GB physical memory used  2 2 GB of 2 1 GB virtual memory used  Killing container    I then tried to set memory limit in mapred-site xml      lt property gt       lt name gt mapreduce map memory mb lt  name gt       lt value gt 4096 lt  value gt     lt  property gt     lt property gt       lt name gt mapreduce reduce memory mb lt  name gt       lt value gt 4096 lt  value gt     lt  property gt    But still getting error   Container  pid 26783 containerID container 1389136889967 0009 01 000002  is running beyond physical memory limits  Current usage  4 2 GB of 4 GB physical memory used  5 2 GB of 8 4 GB virtual memory used  Killing container    I m confused why the the map task need this much memory  In my understanding  1GB of memory is enough for my map reduce task  Why as I assign more memory to container  the task use more  Is it because each task gets more splits  I feel it s more efficient to decrease the size of container a little bit and create more containers  so that more tasks are running in parallel  The problem is how can I make sure each container won t be assigned more splits than it can handle

User · Answer

I had a really similar issue using HIVE in EMR. None of the extant solutions worked for me -- ie, none of the mapreduce configurations worked for me; and neither did setting yarn.nodemanager.vmem-check-enabled to false.

However, what ended up working was setting tez.am.resource.memory.mb, for example:

hive -hiveconf tez.am.resource.memory.mb=4096

Another setting to consider tweaking is yarn.app.mapreduce.am.resource.mb

User · Answer

We also faced this issue recently  If the issue is related to mapper memory  couple of things I would like to suggest that needs to be checked are    Check if combiner is enabled or not  If yes  then it means that reduce logic has to be run on all the records  output of mapper   This happens in memory  Based on your application you need to check if enabling combiner helps or not  Trade off is between the network transfer bytes and time taken memory CPU for the reduce logic on  X  number of records    If you feel that combiner is not much of value  just disable it  If you need combiner and  X  is a huge number  say millions of records  then considering changing your split logic  For default input formats use less block size  normally 1 block size   1 split  to map less number of records to a single mapper   Number of records getting processed in a single mapper  Remember that all these records need to be sorted in memory  output of mapper is sorted   Consider setting mapreduce task io sort mb   default is 200MB  to a higher value if needed  mapred-configs xml If any of the above didn t help  try to run the mapper logic as a standalone application and profile the application using a Profiler  like JProfiler  and see where the memory getting used  This can give you very good insights

User · Answer

While working with spark in EMR I was having the same problem and setting maximizeResourceAllocation true did the trick  hope it helps someone  You have to set it when you create the cluster  From the EMR docs   aws emr create-cluster --release-label emr-5 4 0 --applications Name Spark   --instance-type m3 xlarge --instance-count 2 --service-role EMR DefaultRole --ec2-attributes InstanceProfile EMR EC2 DefaultRole --configurations https   s3 amazonaws com mybucket myfolder myConfig json   Where myConfig json should say              Classification    spark        Properties            maximizeResourceAllocation    true

User · Answer

I can t comment on the accepted answer  due to low reputation  However  I would like to add  this behavior is by design  The NodeManager is killing your container  It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task  The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce map memory mb or mapreduce reduce memory mb respectively  we would expect the Nodemanager to kill the task  otherwise your task is stealing memory belonging to other containers  which you don t want

User · Answer

Running yarn on Windows Linux subsystem with Ubunto OS  error  running beyond virtual memory limits  Killing container  I resolved it by disabling virtual memory check in the file  yarn-site xml   lt property gt   lt name gt yarn nodemanager vmem-check-enabled lt  name gt   lt value gt false lt  value gt   lt  property gt

User · Answer

I haven t personally checked  but hadoop-yarn-container-virtual-memory-understanding-and-solving-container-is-running-beyond-virtual-memory-limits-errors sounds very reasonable I solved the issue by changing yarn nodemanager vmem-pmem-ratio to a higher value   and I would agree that   Another less recommended solution is to disable the virtual memory check by setting yarn nodemanager vmem-check-enabled to false

User · Answer

You should also properly configure the maximum memory allocations for MapReduce  From this HortonWorks tutorial                 Each machine in our cluster has 48 GB of RAM  Some of this RAM should be  reserved for Operating System usage  On each node  we   ll assign 40 GB RAM for  YARN to use and keep 8 GB for the Operating System      For our example cluster  we have the minimum RAM for a Container    yarn scheduler minimum-allocation-mb    2 GB  We   ll thus assign 4 GB   for Map task Containers  and 8 GB for Reduce tasks Containers       In mapred-site xml       mapreduce map memory mb  4096      mapreduce reduce memory mb  8192      Each Container will run JVMs for the Map and Reduce tasks  The JVM   heap size should be set to lower than the Map and Reduce memory   defined above  so that they are within the bounds of the Container   memory allocated by YARN       In mapred-site xml       mapreduce map java opts  -Xmx3072m      mapreduce reduce java opts  -Xmx6144m      The above settings configure the upper limit of the physical RAM that   Map and Reduce tasks will use     To sum it up    In YARN  you should use the mapreduce configs  not the mapred ones  EDIT  This comment is not applicable anymore now that you ve edited your question  What you are configuring is actually how much you want to request  not what is the max to allocate  The max limits are configured with the java opts settings listed above    Finally  you may want to check this other SO question that describes a similar problem  and solution

User · Answer

There is a check placed at Yarn level for Virtual and Physical memory usage ratio  Issue is not only that VM doesn t have sufficient physical memory  But it is because Virtual memory usage is more than expected for given physical memory   Note   This is happening on Centos RHEL 6 due to its aggressive allocation of virtual memory   It can be resolved either by     Disable virtual memory usage check by setting yarn nodemanager vmem-check-enabled to false  Increase VM PM ratio by setting yarn nodemanager vmem-pmem-ratio to some higher value    References    https   issues apache org jira browse HADOOP-11364  http   blog cloudera com blog 2014 04 apache-hadoop-yarn-avoiding-6-time-consuming-gotchas   Add following property in yarn-site xml    lt property gt      lt name gt yarn nodemanager vmem-check-enabled lt  name gt       lt value gt false lt  value gt       lt description gt Whether virtual memory limits will be enforced for containers lt  description gt     lt  property gt    lt property gt      lt name gt yarn nodemanager vmem-pmem-ratio lt  name gt       lt value gt 4 lt  value gt       lt description gt Ratio between virtual memory to physical memory when setting memory limits for containers lt  description gt     lt  property gt

[hadoop] Container is running beyond memory limits

Examples related to hadoop

Examples related to mapreduce

Examples related to yarn

Examples related to mrv2