Yarn NodeManagers not coming up after upgrade to CM 5.11.x

0 votes
2 views
asked Aug 28, 2017 in Hadoop by admin (4,410 points)
Summary
Applies To

- Clusters running CM 5.11.0, configured with YARN and cgroups
- Non-systemd distros (RHEL6, Debian 7, older Ubuntu).
 

Symptoms

Yarn NodeManagers not coming up after upgrade to CM 5.11.x


After upgrading to CM-5.11, Nodemanagers are not starting up

Following exception is seen in the Nodemanager role logs:

INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:221)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:514)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:561)
Caused by: java.io.IOException: Not able to enforce cpu weights; cannot write to cgroup at: /var/run/cloudera-scm-agent/cgroups/cpu
at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.initializeControllerPaths(CgroupsLCEResourcesHandler.java:502)
at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:154)
at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:137)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:215)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:219)
... 3 more
2017-04-25 17:31:28,525 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:221)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:514)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:561)
Caused by: java.io.IOException: Not able to enforce cpu weights; cannot write to cgroup at: /var/run/cloudera-scm-agent/cgroups/cpu
at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.initializeControllerPaths(CgroupsLCEResourcesHandler.java:502)
at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:154)
at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:137)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:215)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:219)
... 3 more

Cause

This is due to a known issue which was introduced in CM-5.11 where cgroup mount points are not being recognized by CM on RHEL-6.x.

Instructions

The following workaround is available:  

  1. Please run the following script on all hosts with CM Agents:
    #!/bin/bash
    for i in blkio cpuacct cpu memory ; do
    mkdir -p /var/run/cloudera-scm-agent/cgroup2/$i ;
    mount -t cgroup -o rw,relatime,$i cgroup /var/run/cloudera-scm-agent/cgroup2/$i ;
    done
  2. Restart the CM Agents 
    $ service cloudera-scm-agent restart


This should get the CM Agents and the Nodemanagers into a functional state. If this is not successful, the hosts may need to be rebooted.

Please log in or register to answer this question.

...