YARN logs under /tmp/logs/{user.name}/logs not cleared properly

0 votes
4 views
asked Aug 30, 2017 in Hadoop by admin (4,410 points)
SummaryThis article explains how to fix an issue that YARN application logs under /tmp/logs/{user.name}/logs in HDFS do not get removed properly, which caused millions of object sitting under HDFS forever.

Symptoms

There are millions of files and directories sitting under /tmp/logs/hive/logs in HDFS, they are all in the format of application_1468XXXXXX_XXXX. Checking further showed that those files were generated more than a year ago and never got cleared up by YARN.

Applies To

YARN

Cause

Based on the output of HDFS command:

hdfs dfs -ls -R /tmp/logs/hive/logs

we noticed that the those directories were owned by hive:hive, and the YARN logs showed the following error:

could not read the contents of hdfs://nameservice1/tmp/logs/hive/logs 
Permission denied: user==mapred, access=EXECUTE, inode="/tmp/logs/hive":hive:hive:drwxrwx---

This confirms that the issue was caused by wrong permission/ownership of files/directories under /tmp/logs/hive/logs, as user "yarn" is not able to remove files/directories that is only owned by "hive:hive".

Instructions

Run the following command to update /tmp/logs/* ownership to be owned by user "hive" and group "hadoop":

hadoop fs -chown -R :hadoop /tmp/logs/*

Because user "yarn" is part of "hadoop" group, updating the ownership of this directory will allow user "yarn" to be able to remove those files and directories, hence fixes the issue.

The configuration that controls the YARN log retention is below:

Cloudera Manager Home Page > YARN > Configuration > "Log Aggregation Retention Period" and default value is 7 days, this means the YARN logs older than 7 days will be removed by YARN automatically in the background. Confirm that files will be removed under /tmp/logs directory in HDFS after 7 days of creation.

Please log in or register to answer this question.

...