[linux] Total size of the contents of all the files in a directory

When I use ls or du, I get the amount of disk space each file is occupying.

I need the sum total of all the data in files and subdirectories I would get if I opened each file and counted the bytes. Bonus points if I can get this without opening each file and counting.

This question is related to linux shell

The answer is


For Win32 DOS, you can:

c:> dir /s c:\directory\you\want

and the penultimate line will tell you how many bytes the files take up.

I know this reads all files and directories, but works faster in some situations.


cd to directory, then:

du -sh

ftw!

Originally wrote about it here: https://ao.gl/get-the-total-size-of-all-the-files-in-a-directory/


Just an alternative:

ls -lAR | grep -v '^d' | awk '{total += $5} END {print "Total:", total}'

grep -v '^d' will exclude the directories.


If you want the 'apparent size' (that is the number of bytes in each file), not size taken up by files on the disk, use the -b or --bytes option (if you got a Linux system with GNU coreutils):

% du -sbh <directory>

Use du -sb:

du -sb DIR

Optionally, add the h option for more user-friendly output:

du -sbh DIR

stat's "%s" format gives you the actual number of bytes in a file.

 find . -type f |
 xargs stat --format=%s |
 awk '{s+=$1} END {print s}'

Feel free to substitute your favourite method for summing numbers.


If you use busybox's "du" in emebedded system, you can not get a exact bytes with du, only Kbytes you can get.

BusyBox v1.4.1 (2007-11-30 20:37:49 EST) multi-call binary

Usage: du [-aHLdclsxhmk] [FILE]...

Summarize disk space used for each FILE and/or directory.
Disk space is printed in units of 1024 bytes.

Options:
        -a      Show sizes of files in addition to directories
        -H      Follow symbolic links that are FILE command line args
        -L      Follow all symbolic links encountered
        -d N    Limit output to directories (and files with -a) of depth < N
        -c      Output a grand total
        -l      Count sizes many times if hard linked
        -s      Display only a total for each argument
        -x      Skip directories on different filesystems
        -h      Print sizes in human readable format (e.g., 1K 243M 2G )
        -m      Print sizes in megabytes
        -k      Print sizes in kilobytes(default)

This may help:

ls -l| grep -v '^d'| awk '{total = total + $5} END {print "Total" , total}'

The above command will sum total all the files leaving the directories size.


du is handy, but find is useful in case if you want to calculate the size of some files only (for example, using filter by extension). Also note that find themselves can print the size of each file in bytes. To calculate a total size we can connect dc command in the following manner:

find . -type f -printf "%s + " | dc -e0 -f- -ep

Here find generates sequence of commands for dc like 123 + 456 + 11 +. Although, the completed program should be like 0 123 + 456 + 11 + p (remember postfix notation).

So, to get the completed program we need to put 0 on the stack before executing the sequence from stdin, and print the top number after executing (the p command at the end). We achieve it via dc options:

  1. -e0 is just shortcut for -e '0' that puts 0 on the stack,
  2. -f- is for read and execute commands from stdin (that generated by find here),
  3. -ep is for print the result (-e 'p').

To print the size in MiB like 284.06 MiB we can use -e '2 k 1024 / 1024 / n [ MiB] p' in point 3 instead (most spaces are optional).


When a folder is created, many Linux filesystems allocate 4096 bytes to store some metadata about the directory itself. This space is increased by a multiple of 4096 bytes as the directory grows.

du command (with or without -b option) take in count this space, as you can see typing:

mkdir test && du -b test

you will have a result of 4096 bytes for an empty dir. So, if you put 2 files of 10000 bytes inside the dir, the total amount given by du -sb would be 24096 bytes.

If you read carefully the question, this is not what asked. The questioner asked:

the sum total of all the data in files and subdirectories I would get if I opened each file and counted the bytes

that in the example above should be 20000 bytes, not 24096.

So, the correct answer IMHO could be a blend of Nelson answer and hlovdal suggestion to handle filenames containing spaces:

find . -type f -print0 | xargs -0 stat --format=%s | awk '{s+=$1} END {print s}'

There are at least three ways to get the "sum total of all the data in files and subdirectories" in bytes that work in both Linux/Unix and Git Bash for Windows, listed below in order from fastest to slowest on average. For your reference, they were executed at the root of a fairly deep file system (docroot in a Magento 2 Enterprise installation comprising 71,158 files in 30,027 directories).

1.

$ time find -type f -printf '%s\n' | awk '{ total += $1 }; END { print total" bytes" }'
748660546 bytes

real    0m0.221s
user    0m0.068s
sys     0m0.160s

2.

$ time echo `find -type f -print0 | xargs -0 stat --format=%s | awk '{total+=$1} END {print total}'` bytes
748660546 bytes

real    0m0.256s
user    0m0.164s
sys     0m0.196s

3.

$ time echo `find -type f -exec du -bc {} + | grep -P "\ttotal$" | cut -f1 | awk '{ total += $1 }; END { print total }'` bytes
748660546 bytes

real    0m0.553s
user    0m0.308s
sys     0m0.416s


These two also work, but they rely on commands that don't exist on Git Bash for Windows:

1.

$ time echo `find -type f -printf "%s + " | dc -e0 -f- -ep` bytes
748660546 bytes

real    0m0.233s
user    0m0.116s
sys     0m0.176s

2.

$ time echo `find -type f -printf '%s\n' | paste -sd+ | bc` bytes
748660546 bytes

real    0m0.242s
user    0m0.104s
sys     0m0.152s


If you only want the total for the current directory, then add -maxdepth 1 to find.


Note that some of the suggested solutions don't return accurate results, so I would stick with the solutions above instead.

$ du -sbh
832M    .

$ ls -lR | grep -v '^d' | awk '{total += $5} END {print "Total:", total}'
Total: 583772525

$ find . -type f | xargs stat --format=%s | awk '{s+=$1} END {print s}'
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
4390471

$ ls -l| grep -v '^d'| awk '{total = total + $5} END {print "Total" , total}'
Total 968133

Use:

$ du -ckx <DIR> | grep total | awk '{print $1}'

Where <DIR> is the directory you want to inspect.

The '-c' gives you grand total data which is extracted using the 'grep total' portion of the command, and the count in Kbytes is extracted with the awk command.

The only caveat here is if you have a subdirectory containing the text "total" it will get spit out as well.