[hadoop] What is Hive: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

I am getting:

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

While trying to make a copy of a partitioned table using the commands in the hive console:

CREATE TABLE copy_table_name LIKE table_name;
INSERT OVERWRITE TABLE copy_table_name PARTITION(day) SELECT * FROM table_name;

I initially got some semantic analysis errors and had to set:

set hive.exec.dynamic.partition=true
set hive.exec.dynamic.partition.mode=nonstrict

Although I'm not sure what the above properties do?

Full ouput from hive console:

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201206191101_4557, Tracking URL = http://jobtracker:50030/jobdetails.jsp?jobid=job_201206191101_4557
Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=master:8021 -kill job_201206191101_4557
2012-06-25 09:53:05,826 Stage-1 map = 0%,  reduce = 0%
2012-06-25 09:53:53,044 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201206191101_4557 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

This question is related to hadoop mapreduce hive

The answer is


I removed the _SUCCESS file from the EMR output path in S3 and it worked fine.


Even I faced the same issue - when checked on dashboard I found following Error. As the data was coming through Flume and had interrupted in between due to which may be there was inconsistency in few files.

Caused by: org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input within/between OBJECT entries

Running on fewer files it worked. Format consistency was the reason in my case.


Adding some information here, as it took me awhile to find the hadoop jobtracker web-dashboard in HDInsight (Azure's Hadoop), and a colleague finally showed me where it was. There is a shortcut on the head node called "Hadoop Yarn Status" which is just a link to a local http page (http://headnodehost:9014/cluster in my case). When opened the dashboard looked like this:

enter image description here

In that dashboard you can find your failed application, and then after clicking into it you can look at the logs of the individual map and reduce jobs.

In my case it seemed to still be running out of memory in the reducers, even though I had cranked the memory in the configuration already. For some reason it was not surfacing the "java outofmemory" errors I got earlier though.


I know I am 3 years late on this thread, however still providing my 2 cents for similar cases in future.

I recently faced the same issue/error in my cluster. The JOB would always get to some 80%+ reduction and fail with the same error, with nothing to go on in the execution logs either. Upon multiple iterations and research I found that among the plethora of files getting loaded some were non-compliant with the structure provided for the base table(table being used to insert data into partitioned table).

Point to be noted here is whenever I executed a select query for a particular value in the partitioning column or created a static partition it worked fine as in that case error records were being skipped.

TL;DR: Check the incoming data/files for inconsistency in the structuring as HIVE follows Schema-On-Read philosophy.


I faced the same issue because I didn't have permission to query the database I was trying to.

In the case you don't have permission to query the table/database, besides the Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask error, you will see that in Cloudera Manager is not even registering your query.


I was also facing same error when I was inserting the data into HIVE external table which was pointing to Elastic search cluster.

I replaced the older JAR elasticsearch-hadoop-2.0.0.RC1.jar to elasticsearch-hadoop-5.6.0.jar, and everything worked fine.

My Suggestion is please use the specific JAR as per the elastic search version. Don't use older JARs if you are using newer version of elastic search.

Thanks to this post Hive- Elasticsearch Write Operation #409


The top answer is right, that the error code doesn't give you much info. One of the common causes that we saw in our team for this error code was when the query was not optimized well. A known reason was when we do an inner join with the left side table magnitudes bigger than the table on right side. Swapping these tables would usually do the trick in such cases.


Examples related to hadoop

Hadoop MapReduce: Strange Result when Storing Previous Value in Memory in a Reduce Class (Java) What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check Spark Version What are the pros and cons of parquet format compared to other formats? java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient How to export data from Spark SQL to CSV How to copy data from one HDFS to another HDFS? How to calculate Date difference in Hive Select top 2 rows in Hive Spark - load CSV file as DataFrame?

Examples related to mapreduce

Hadoop MapReduce: Strange Result when Storing Previous Value in Memory in a Reduce Class (Java) Java8: HashMap<X, Y> to HashMap<X, Z> using Stream / Map-Reduce / Collector What is the purpose of shuffling and sorting phase in the reducer in Map Reduce Programming? Container is running beyond memory limits Hive ParseException - cannot recognize input near 'end' 'string' Count lines in large files Good MapReduce examples What is Hive: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask Setting the number of map tasks and reduce tasks Map and Reduce in .NET

Examples related to hive

select rows in sql with latest date for each ID repeated multiple times PySpark: withColumn() with two conditions and three outcomes java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient Hive cast string to date dd-MM-yyyy How to save DataFrame directly to Hive? How to calculate Date difference in Hive Select top 2 rows in Hive Just get column names from hive table Create hive table using "as select" or "like" and also specify delimiter Hive Alter table change Column Name