[hadoop] The way to check a HDFS directory's size?

I know du -sh in common Linux filesystems. But how to do that with HDFS?

The answer is

With this you will get size in GB

hdfs dfs -du PATHTODIRECTORY | awk '/^[0-9]+/ { print int($1/(1024**3)) " [GB]\t" $2 }'

hdfs dfs -count <dir>

info from man page:

-count [-q] [-h] [-v] [-t [<storage type>]] [-u] <path> ... :
  Count the number of directories, files and bytes under the paths
  that match the specified file pattern.  The output columns are:
  or, with the -q option:

When trying to calculate the total of a particular group of files within a directory the -s option does not work (in Hadoop 2.7.1). For example:

Directory structure:


Assume each file is 1 KB in size. You can summarize the entire directory with:

hdfs dfs -du -s some_dir
4096 some_dir

However, if I want the sum of all files containing "count" the command falls short.

hdfs dfs -du -s some_dir/count*
1024 some_dir/count1.txt
1024 some_dir/count2.txt

To get around this I usually pass the output through awk.

hdfs dfs -du some_dir/count* | awk '{ total+=$1 } END { print total }'

hadoop fs -du -s -h /path/to/dir displays a directory's size in readable form.

Extending to Matt D and others answers, the command can be till Apache Hadoop 3.0.0

hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]

It displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.


  • The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, the calculation is done by going 1-level deep from the given path.
  • The -h option will format file sizes in a human-readable fashion (e.g 64.0m instead of 67108864)
  • The -v option will display the names of columns as a header line.
  • The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path.

du returns three columns with the following format:

 | size  |  disk_space_consumed_with_all_replicas  |  full_path_name | 

##Example command:

hadoop fs -du /user/hadoop/dir1 \
    /user/hadoop/file1 \

Exit Code: Returns 0 on success and -1 on error.

source: Apache doc

% of used space on Hadoop cluster
sudo -u hdfs hadoop fs –df

Capacity under specific folder:
sudo -u hdfs hadoop fs -du -h /user

To get the size of the directory hdfs dfs -du -s -h /$yourDirectoryName can be used. hdfs dfsadmin -report can be used to see a quick cluster level storage report.

Command Should be hadoop fs -du -s -h \dirPath

  • -du [-s] [-h] ... : Show the amount of space, in bytes, used by the files that match the specified file pattern.

  • -s : Rather than showing the size of each individual file that matches the
    pattern, shows the total (summary) size.

  • -h : Formats the sizes of files in a human-readable fashion rather than a number of bytes. (Ex MB/GB/TB etc)

    Note that, even without the -s option, this only shows size summaries one level deep into a directory.

    The output is in the form size name(full path)

hadoop version 2.3.33:

hadoop fs -dus  /path/to/dir  |   awk '{print $2/1024**3 " G"}' 

