Difference in disk usage reported by Cloudera Manager Reports vs HDFS CLI du tool

0 votes
asked Aug 19, 2017 in Hadoop by admin (4,410 points)
SummaryWhen there is a fairly large difference between the disk usage reported in CM reports vs what is reported through the hdfs cli tools, this article may help.
When using Cloudera Manager Reports Manager to check the space usage of directory /data, the values shown in "Current Disk Usage By Directory" are:
data 9.3 TiB 28.0 TiB

After converting 28TiB to bytes:
28TiB = 1099511627776*28 = 30786325577728 bytes

However, when execute Hadoop CLI tool du for the same directory, the value below is shown:
hdfs dfs -du / | grep data
14795860820650 62191790304036 /data

It looks like the value from du is more than twice as big as what reports in the Reports Manager.
Applies To
  • Cloudera Manager Reports Manager
  • Snapshots
  • HDFS CLI du tool
  • HDFS CLI "du" output not only include normal files but also includes the files that have been deleted and exist in snapshots (which is true in terms of real resource consumption).
  • Reports Manager shows the current disk usage and does not include files from snapshots (which is true in terms of size of available data).
First is to check if there are snapshots involved. Here are the steps:
  1. Check HDFS directories size using “sudo -u hdfs hdfs dfs -du /data
  2. Compare the directories size from Reports Manager reports with HDFS reports and note down the directories with different sizes.
  3. Check if there are snapshots in those directories using “sudo -u hdfs hdfs lsSnapshottableDir”.
Second once confirmed that snapshots are involved, below points can be followed:
  1. Use "du" if you want to measure the disk space usage.
  2. Use the Reports Manager to see how much data you have in a directory.
  3. With the feature from HDFS-8986 (which was shipped in CDH5.10 and above), you can exclude snapshots from calculation which makes Reports Manager reports and HDFS CLI reports consistent.

Please log in or register to answer this question.