[linux] How can I find all of the distinct file extensions in a folder hierarchy?

On a Linux machine I would like to traverse a folder hierarchy and get a list of all of the distinct file extensions within it.

What would be the best way to achieve this from a shell?

This question is related to linux grep filesystems file-extension

The answer is


I've found it simple and fast...

   # find . -type f -exec basename {} \; | awk -F"." '{print $NF}' > /tmp/outfile.txt
   # cat /tmp/outfile.txt | sort | uniq -c| sort -n > tmp/outfile_sorted.txt

In Python using generators for very large directories, including blank extensions, and getting the number of times each extension shows up:

import json
import collections
import itertools
import os

root = '/home/andres'
files = itertools.chain.from_iterable((
    files for _,_,files in os.walk(root)
    ))
counter = collections.Counter(
    (os.path.splitext(file_)[1] for file_ in files)
)
print json.dumps(counter, indent=2)

I don't think this one was mentioned yet:

find . -type f -exec sh -c 'echo "${0##*.}"' {} \; | sort | uniq -c

you could also do this

find . -type f -name "*.php" -exec PATHTOAPP {} +

Since there's already another solution which uses Perl:

If you have Python installed you could also do (from the shell):

python -c "import os;e=set();[[e.add(os.path.splitext(f)[-1]) for f in fn]for _,_,fn in os.walk('/home')];print '\n'.join(e)"

Powershell:

dir -recurse | select-object extension -unique

Thanks to http://kevin-berridge.blogspot.com/2007/11/windows-powershell.html


My awk-less, sed-less, Perl-less, Python-less POSIX-compliant alternative:

find . -type f | rev | cut -d. -f1 | rev  | tr '[:upper:]' '[:lower:]' | sort | uniq --count | sort -rn

The trick is that it reverses the line and cuts the extension at the beginning.
It also converts the extensions to lower case.

Example output:

   3689 jpg
   1036 png
    610 mp4
     90 webm
     90 mkv
     57 mov
     12 avi
     10 txt
      3 zip
      2 ogv
      1 xcf
      1 trashinfo
      1 sh
      1 m4v
      1 jpeg
      1 ini
      1 gqv
      1 gcs
      1 dv

The accepted answer uses REGEX and you cannot create an alias command with REGEX, you have to put it into a shell script, I'm using Amazon Linux 2 and did the following:

  1. I put the accepted answer code into a file using :

    sudo vim find.sh

add this code:

find ./ -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u

save the file by typing: :wq!

  1. sudo vim ~/.bash_profile

  2. alias getext=". /path/to/your/find.sh"

  3. :wq!

  4. . ~/.bash_profile


I think the most simple & straightforward way is

for f in *.*; do echo "${f##*.}"; done | sort -u

It's modified on ChristopheD's 3rd way.


Recursive version:

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u

If you want totals (how may times the extension was seen):

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort | uniq -c | sort -rn

Non-recursive (single folder):

for f in *.*; do printf "%s\n" "${f##*.}"; done | sort -u

I've based this upon this forum post, credit should go there.


Adding my own variation to the mix. I think it's the simplest of the lot and can be useful when efficiency is not a big concern.

find . -type f | grep -o -E '\.[^\.]+$' | sort -u

I tried a bunch of the answers here, even the "best" answer. They all came up short of what I specifically was after. So besides the past 12 hours of sitting in regex code for multiple programs and reading and testing these answers this is what I came up with which works EXACTLY like I want.

 find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort -u
  • Finds all files which may have an extension.
  • Greps only the extension
  • Greps for file extensions between 2 and 16 characters (just adjust the numbers if they don't fit your need). This helps avoid cache files and system files (system file bit is to search jail).
  • Awk to print the extensions in lower case.
  • Sort and bring in only unique values. Originally I had attempted to try the awk answer but it would double print items that varied in case sensitivity.

If you need a count of the file extensions then use the below code

find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort | uniq -c | sort -rn

While these methods will take some time to complete and probably aren't the best ways to go about the problem, they work.

Update: Per @alpha_989 long file extensions will cause an issue. That's due to the original regex "[[:alpha:]]{3,6}". I have updated the answer to include the regex "[[:alpha:]]{2,16}". However anyone using this code should be aware that those numbers are the min and max of how long the extension is allowed for the final output. Anything outside that range will be split into multiple lines in the output.

Note: Original post did read "- Greps for file extensions between 3 and 6 characters (just adjust the numbers if they don't fit your need). This helps avoid cache files and system files (system file bit is to search jail)."

Idea: Could be used to find file extensions over a specific length via:

 find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{4,}" | awk '{print tolower($0)}' | sort -u

Where 4 is the file extensions length to include and then find also any extensions beyond that length.


Find everythin with a dot and show only the suffix.

find . -type f -name "*.*" | awk -F. '{print $NF}' | sort -u

if you know all suffix have 3 characters then

find . -type f -name "*.???" | awk -F. '{print $NF}' | sort -u

or with sed shows all suffixes with one to four characters. Change {1,4} to the range of characters you are expecting in the suffix.

find . -type f | sed -n 's/.*\.\(.\{1,4\}\)$/\1/p'| sort -u

No need for the pipe to sort, awk can do it all:

find . -type f | awk -F. '!a[$NF]++{print $NF}'

None of the replies so far deal with filenames with newlines properly (except for ChristopheD's, which just came in as I was typing this). The following is not a shell one-liner, but works, and is reasonably fast.

import os, sys

def names(roots):
    for root in roots:
        for a, b, basenames in os.walk(root):
            for basename in basenames:
                yield basename

sufs = set(os.path.splitext(x)[1] for x in names(sys.argv[1:]))
for suf in sufs:
    if suf:
        print suf

Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to grep

grep's at sign caught as whitespace cat, grep and cut - translated to python How to suppress binary file matching results in grep Linux find and grep command together Filtering JSON array using jQuery grep() Linux Script to check if process is running and act on the result grep without showing path/file:line How do you grep a file and get the next 5 lines How to grep, excluding some patterns? Fast way of finding lines in one file that are not in another?

Examples related to filesystems

Get an image extension from an uploaded file in Laravel Notepad++ cached files location No space left on device How to create a directory using Ansible best way to get folder and file list in Javascript Exploring Docker container's file system Remove directory which is not empty GIT_DISCOVERY_ACROSS_FILESYSTEM not set Trying to create a file in Android: open failed: EROFS (Read-only file system) Node.js check if path is file or directory

Examples related to file-extension

find files by extension, *.html under a folder in nodejs JPG vs. JPEG image formats What is phtml, and when should I use a .phtml extension rather than .php? How to get the file extension in PHP? Given a filesystem path, is there a shorter way to extract the filename without its extension? What file uses .md extension and how should I edit them? How can I check the extension of a file? Changing file extension in Python How can I find all of the distinct file extensions in a folder hierarchy? Should you use .htm or .html file extension? What is the difference, and which file is correct?