MR job fails due to shuffle error java.lang.NegativeArraySizeException

0 votes
3 views
asked Aug 30, 2017 in Hadoop by admin (4,410 points)
Summarywhen you have large reducer memory setting, say > 8GB, there's a chance you will run into the shuffle NegativeArraySizeException issue due to bug MAPREDUCE-6724 and MAPREDUCE-6805.

Symptoms

When your reducer task memory is set to higher than 8GB, some times you run into such shuffle memory issue. The reason behind this is: the shuffle size is limited to an in-memory array, where maximum is 2GB. And mapreduce.reduce.shuffle.memory.limit.percent is default to 0.25, so when you have large reducer setting, say > 8GB, there's a chance you will run into the shuffle NegativeArraySizeException issue. 

The stack looks as below:

2017-03-22 23:08:08,420 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hive (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#8
2017-03-22 23:08:09,072 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#8
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NegativeArraySizeException
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:305)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:295)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:514)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)

Applies To
  • CDH < 5.8.0
Cause

This is MAPREDUCE-6805, and fixed in MAPREDUCE-6724:

Instructions

1) We already have fixed this in our release CDH 5.9.x and 5.10.x version. So the long term solution is upgrade to 5.9 or above version.

2) For those still with earlier version's CDH, as a workaround, you have to limit the shuffle memory to under 2GB. Say you have 20GB for reduce memory, then you have to set:

mapreduce.reduce.shuffle.memory.limit.percent = 0.1

for your job.  The mapreduce.reduce.shuffle.memory.limit.percent above is calculated by:

mapreduce.reduce.shuffle.memory.limit.percent = (Max shuffle size)2GB /  (Reduce Mem

Please log in or register to answer this question.

...