[python] List directory tree structure in python?

I know that we can use os.walk() to list all sub-directories or all files in a directory. However, I would like to list the full directory tree content:

- Subdirectory 1:
   - file11
   - file12
   - Sub-sub-directory 11:
         - file111
         - file112
- Subdirectory 2:
    - file21
    - sub-sub-directory 21
    - sub-sub-directory 22    
        - sub-sub-sub-directory 221
            - file 2211

How to best achieve this in Python?

This question is related to python traversal directory-structure

The answer is


This solution will only work if you have tree installed on your system. However I'm leaving this solution here just in case it helps someone else out.

You can tell tree to output the tree structure as XML (tree -X) or JSON (tree -J). JSON of course can be parsed directly with python and XML can easily be read with lxml.

With the following directory structure as an example:

[sri@localhost Projects]$ tree --charset=ascii bands
bands
|-- DreamTroll
|   |-- MattBaldwinson
|   |-- members.txt
|   |-- PaulCarter
|   |-- SimonBlakelock
|   `-- Rob Stringer
|-- KingsX
|   |-- DougPinnick
|   |-- JerryGaskill
|   |-- members.txt
|   `-- TyTabor
|-- Megadeth
|   |-- DaveMustaine
|   |-- DavidEllefson
|   |-- DirkVerbeuren
|   |-- KikoLoureiro
|   `-- members.txt
|-- Nightwish
|   |-- EmppuVuorinen
|   |-- FloorJansen
|   |-- JukkaNevalainen
|   |-- MarcoHietala
|   |-- members.txt
|   |-- TroyDonockley
|   `-- TuomasHolopainen
`-- Rush
    |-- AlexLifeson
    |-- GeddyLee
    `-- NeilPeart

5 directories, 25 files

XML

<?xml version="1.0" encoding="UTF-8"?>
<tree>
  <directory name="bands">
    <directory name="DreamTroll">
      <file name="MattBaldwinson"></file>
      <file name="members.txt"></file>
      <file name="PaulCarter"></file>
      <file name="RobStringer"></file>
      <file name="SimonBlakelock"></file>
    </directory>
    <directory name="KingsX">
      <file name="DougPinnick"></file>
      <file name="JerryGaskill"></file>
      <file name="members.txt"></file>
      <file name="TyTabor"></file>
    </directory>
    <directory name="Megadeth">
      <file name="DaveMustaine"></file>
      <file name="DavidEllefson"></file>
      <file name="DirkVerbeuren"></file>
      <file name="KikoLoureiro"></file>
      <file name="members.txt"></file>
    </directory>
    <directory name="Nightwish">
      <file name="EmppuVuorinen"></file>
      <file name="FloorJansen"></file>
      <file name="JukkaNevalainen"></file>
      <file name="MarcoHietala"></file>
      <file name="members.txt"></file>
      <file name="TroyDonockley"></file>
      <file name="TuomasHolopainen"></file>
    </directory>
    <directory name="Rush">
      <file name="AlexLifeson"></file>
      <file name="GeddyLee"></file>
      <file name="NeilPeart"></file>
    </directory>
  </directory>
  <report>
    <directories>5</directories>
    <files>25</files>
  </report>
</tree>

JSON

[sri@localhost Projects]$ tree -J bands
[
  {"type":"directory","name":"bands","contents":[
    {"type":"directory","name":"DreamTroll","contents":[
      {"type":"file","name":"MattBaldwinson"},
      {"type":"file","name":"members.txt"},
      {"type":"file","name":"PaulCarter"},
      {"type":"file","name":"RobStringer"},
      {"type":"file","name":"SimonBlakelock"}
    ]},
    {"type":"directory","name":"KingsX","contents":[
      {"type":"file","name":"DougPinnick"},
      {"type":"file","name":"JerryGaskill"},
      {"type":"file","name":"members.txt"},
      {"type":"file","name":"TyTabor"}
    ]},
    {"type":"directory","name":"Megadeth","contents":[
      {"type":"file","name":"DaveMustaine"},
      {"type":"file","name":"DavidEllefson"},
      {"type":"file","name":"DirkVerbeuren"},
      {"type":"file","name":"KikoLoureiro"},
      {"type":"file","name":"members.txt"}
    ]},
    {"type":"directory","name":"Nightwish","contents":[
      {"type":"file","name":"EmppuVuorinen"},
      {"type":"file","name":"FloorJansen"},
      {"type":"file","name":"JukkaNevalainen"},
      {"type":"file","name":"MarcoHietala"},
      {"type":"file","name":"members.txt"},
      {"type":"file","name":"TroyDonockley"},
      {"type":"file","name":"TuomasHolopainen"}
    ]},
    {"type":"directory","name":"Rush","contents":[
      {"type":"file","name":"AlexLifeson"},
      {"type":"file","name":"GeddyLee"},
      {"type":"file","name":"NeilPeart"}
    ]}
  ]},
  {"type":"report","directories":5,"files":25}
]

Here you can find code with output like this: https://stackoverflow.com/a/56622847/6671330

V .
|-> V folder1
|   |-> V folder2
|   |   |-> V folder3
|   |   |   |-> file3.txt
|   |   |-> file2.txt
|   |-> V folderX
|   |-> file1.txt
|-> 02-hw1_wdwwfm.py
|-> 06-t1-home1.py
|-> 06-t1-home2.py
|-> hw1.py

A solution without your indentation:

for path, dirs, files in os.walk(given_path):
  print path
  for f in files:
    print f

os.walk already does the top-down, depth-first walk you are looking for.

Ignoring the dirs list prevents the overlapping you mention.


Maybe faster than @ellockie ( Maybe )

import os
def file_writer(text):
    with open("folder_structure.txt","a") as f_output:
        f_output.write(text)
def list_files(startpath):


    for root, dirs, files in os.walk(startpath):
        level = root.replace(startpath, '').count(os.sep)
        indent = '\t' * 1 * (level)
        output_string = '{}{}/ \n'.format(indent, os.path.basename(root))
        file_writer(output_string)
        subindent = '\t' * 1 * (level + 1)
        output_string = '%s %s \n' %(subindent,[f for f in files])
        file_writer(''.join(output_string))


list_files("/")

Test results in screenshot below:

enter image description here


List directory tree structure in Python?

We usually prefer to just use GNU tree, but we don't always have tree on every system, and sometimes Python 3 is available. A good answer here could be easily copy-pasted and not make GNU tree a requirement.

tree's output looks like this:

$ tree
.
+-- package
¦   +-- __init__.py
¦   +-- __main__.py
¦   +-- subpackage
¦   ¦   +-- __init__.py
¦   ¦   +-- __main__.py
¦   ¦   +-- module.py
¦   +-- subpackage2
¦       +-- __init__.py
¦       +-- __main__.py
¦       +-- module2.py
+-- package2
    +-- __init__.py

4 directories, 9 files

I created the above directory structure in my home directory under a directory I call pyscratch.

I also see other answers here that approach that sort of output, but I think we can do better, with simpler, more modern code and lazily evaluating approaches.

Tree in Python

To begin with, let's use an example that

  • uses the Python 3 Path object
  • uses the yield and yield from expressions (that create a generator function)
  • uses recursion for elegant simplicity
  • uses comments and some type annotations for extra clarity
from pathlib import Path

# prefix components:
space =  '    '
branch = '¦   '
# pointers:
tee =    '+-- '
last =   '+-- '


def tree(dir_path: Path, prefix: str=''):
    """A recursive generator, given a directory Path object
    will yield a visual tree structure line by line
    with each line prefixed by the same characters
    """    
    contents = list(dir_path.iterdir())
    # contents each get pointers that are +-- with a final +-- :
    pointers = [tee] * (len(contents) - 1) + [last]
    for pointer, path in zip(pointers, contents):
        yield prefix + pointer + path.name
        if path.is_dir(): # extend the prefix and recurse:
            extension = branch if pointer == tee else space 
            # i.e. space because last, +-- , above so no more |
            yield from tree(path, prefix=prefix+extension)

and now:

for line in tree(Path.home() / 'pyscratch'):
    print(line)

prints:

+-- package
¦   +-- __init__.py
¦   +-- __main__.py
¦   +-- subpackage
¦   ¦   +-- __init__.py
¦   ¦   +-- __main__.py
¦   ¦   +-- module.py
¦   +-- subpackage2
¦       +-- __init__.py
¦       +-- __main__.py
¦       +-- module2.py
+-- package2
    +-- __init__.py

We do need to materialize each directory into a list because we need to know how long it is, but afterwards we throw the list away. For deep and broad recursion this should be lazy enough.

The above code, with the comments, should be sufficient to fully understand what we're doing here, but feel free to step through it with a debugger to better grock it if you need to.

More features

Now GNU tree gives us a couple of useful features that I'd like to have with this function:

  • prints the subject directory name first (does so automatically, ours does not)
  • prints the count of n directories, m files
  • option to limit recursion, -L level
  • option to limit to just directories, -d

Also, when there is a huge tree, it is useful to limit the iteration (e.g. with islice) to avoid locking up your interpreter with text, as at some point the output becomes too verbose to be useful. We can make this arbitrarily high by default - say 1000.

So let's remove the previous comments and fill out this functionality:

from pathlib import Path
from itertools import islice

space =  '    '
branch = '¦   '
tee =    '+-- '
last =   '+-- '
def tree(dir_path: Path, level: int=-1, limit_to_directories: bool=False,
         length_limit: int=1000):
    """Given a directory Path object print a visual tree structure"""
    dir_path = Path(dir_path) # accept string coerceable to Path
    files = 0
    directories = 0
    def inner(dir_path: Path, prefix: str='', level=-1):
        nonlocal files, directories
        if not level: 
            return # 0, stop iterating
        if limit_to_directories:
            contents = [d for d in dir_path.iterdir() if d.is_dir()]
        else: 
            contents = list(dir_path.iterdir())
        pointers = [tee] * (len(contents) - 1) + [last]
        for pointer, path in zip(pointers, contents):
            if path.is_dir():
                yield prefix + pointer + path.name
                directories += 1
                extension = branch if pointer == tee else space 
                yield from inner(path, prefix=prefix+extension, level=level-1)
            elif not limit_to_directories:
                yield prefix + pointer + path.name
                files += 1
    print(dir_path.name)
    iterator = inner(dir_path, level=level)
    for line in islice(iterator, length_limit):
        print(line)
    if next(iterator, None):
        print(f'... length_limit, {length_limit}, reached, counted:')
    print(f'\n{directories} directories' + (f', {files} files' if files else ''))

And now we can get the same sort of output as tree:

tree(Path.home() / 'pyscratch')

prints:

pyscratch
+-- package
¦   +-- __init__.py
¦   +-- __main__.py
¦   +-- subpackage
¦   ¦   +-- __init__.py
¦   ¦   +-- __main__.py
¦   ¦   +-- module.py
¦   +-- subpackage2
¦       +-- __init__.py
¦       +-- __main__.py
¦       +-- module2.py
+-- package2
    +-- __init__.py

4 directories, 9 files

And we can restrict to levels:

tree(Path.home() / 'pyscratch', level=2)

prints:

pyscratch
+-- package
¦   +-- __init__.py
¦   +-- __main__.py
¦   +-- subpackage
¦   +-- subpackage2
+-- package2
    +-- __init__.py

4 directories, 3 files

And we can limit the output to directories:

tree(Path.home() / 'pyscratch', level=2, limit_to_directories=True)

prints:

pyscratch
+-- package
¦   +-- subpackage
¦   +-- subpackage2
+-- package2

4 directories

Retrospective

In retrospect, we could have used path.glob for matching. We could also perhaps use path.rglob for recursive globbing, but that would require a rewrite. We could also use itertools.tee instead of materializing a list of directory contents, but that could have negative tradeoffs and would probably make the code even more complex.

Comments are welcome!


I came here looking for the same thing and used dhobbs answer for me. As a way of thanking the community, I added some arguments to write to a file, as akshay asked, and made showing files optional so it is not so bit an output. Also made the indentation an optional argument so you can change it, as some like it to be 2 and others prefer 4.

Used different loops so the one not showing files doesn't check if it has to on each iteration.

Hope it helps someone else as dhobbs answer helped me. Thanks a lot.

def showFolderTree(path,show_files=False,indentation=2,file_output=False):
"""
Shows the content of a folder in a tree structure.
path -(string)- path of the root folder we want to show.
show_files -(boolean)-  Whether or not we want to see files listed.
                        Defaults to False.
indentation -(int)- Indentation we want to use, defaults to 2.   
file_output -(string)-  Path (including the name) of the file where we want
                        to save the tree.
"""


tree = []

if not show_files:
    for root, dirs, files in os.walk(path):
        level = root.replace(path, '').count(os.sep)
        indent = ' '*indentation*(level)
        tree.append('{}{}/'.format(indent,os.path.basename(root)))

if show_files:
    for root, dirs, files in os.walk(path):
        level = root.replace(path, '').count(os.sep)
        indent = ' '*indentation*(level)
        tree.append('{}{}/'.format(indent,os.path.basename(root)))    
        for f in files:
            subindent=' ' * indentation * (level+1)
            tree.append('{}{}'.format(subindent,f))

if file_output:
    output_file = open(file_output,'w')
    for line in tree:
        output_file.write(line)
        output_file.write('\n')
else:
    # Default behaviour: print on screen.
    for line in tree:
        print line

Based on this fantastic post

http://code.activestate.com/recipes/217212-treepy-graphically-displays-the-directory-structur/

Here es a refinement to behave exactly like

http://linux.die.net/man/1/tree

#!/usr/bin/env python2
# -*- coding: utf-8 -*-

# tree.py
#
# Written by Doug Dahms
#
# Prints the tree structure for the path specified on the command line

from os import listdir, sep
from os.path import abspath, basename, isdir
from sys import argv

def tree(dir, padding, print_files=False, isLast=False, isFirst=False):
    if isFirst:
        print padding.decode('utf8')[:-1].encode('utf8') + dir
    else:
        if isLast:
            print padding.decode('utf8')[:-1].encode('utf8') + '+-- ' + basename(abspath(dir))
        else:
            print padding.decode('utf8')[:-1].encode('utf8') + '+-- ' + basename(abspath(dir))
    files = []
    if print_files:
        files = listdir(dir)
    else:
        files = [x for x in listdir(dir) if isdir(dir + sep + x)]
    if not isFirst:
        padding = padding + '   '
    files = sorted(files, key=lambda s: s.lower())
    count = 0
    last = len(files) - 1
    for i, file in enumerate(files):
        count += 1
        path = dir + sep + file
        isLast = i == last
        if isdir(path):
            if count == len(files):
                if isFirst:
                    tree(path, padding, print_files, isLast, False)
                else:
                    tree(path, padding + ' ', print_files, isLast, False)
            else:
                tree(path, padding + '¦', print_files, isLast, False)
        else:
            if isLast:
                print padding + '+-- ' + file
            else:
                print padding + '+-- ' + file

def usage():
    return '''Usage: %s [-f] 
Print tree structure of path specified.
Options:
-f      Print files as well as directories
PATH    Path to process''' % basename(argv[0])

def main():
    if len(argv) == 1:
        print usage()
    elif len(argv) == 2:
        # print just directories
        path = argv[1]
        if isdir(path):
            tree(path, '', False, False, True)
        else:
            print 'ERROR: \'' + path + '\' is not a directory'
    elif len(argv) == 3 and argv[1] == '-f':
        # print directories and files
        path = argv[2]
        if isdir(path):
            tree(path, '', True, False, True)
        else:
            print 'ERROR: \'' + path + '\' is not a directory'
    else:
        print usage()

if __name__ == '__main__':
    main()



For those who are still looking for an answer. Here is a recursive approach to get the paths in a dictionary.

import os


def list_files(startpath):
    for root, dirs, files in os.walk(startpath):
        dir_content = []
        for dir in dirs:
            go_inside = os.path.join(startpath, dir)
            dir_content.append(list_files(go_inside))
        files_lst = []
        for f in files:
            files_lst.append(f)
        return {'name': root, 'files': files_lst, 'dirs': dir_content}

Similar to answers above, but for python3, arguably readable and arguably extensible:

from pathlib import Path

class DisplayablePath(object):
    display_filename_prefix_middle = '+--'
    display_filename_prefix_last = '+--'
    display_parent_prefix_middle = '    '
    display_parent_prefix_last = '¦   '

    def __init__(self, path, parent_path, is_last):
        self.path = Path(str(path))
        self.parent = parent_path
        self.is_last = is_last
        if self.parent:
            self.depth = self.parent.depth + 1
        else:
            self.depth = 0

    @property
    def displayname(self):
        if self.path.is_dir():
            return self.path.name + '/'
        return self.path.name

    @classmethod
    def make_tree(cls, root, parent=None, is_last=False, criteria=None):
        root = Path(str(root))
        criteria = criteria or cls._default_criteria

        displayable_root = cls(root, parent, is_last)
        yield displayable_root

        children = sorted(list(path
                               for path in root.iterdir()
                               if criteria(path)),
                          key=lambda s: str(s).lower())
        count = 1
        for path in children:
            is_last = count == len(children)
            if path.is_dir():
                yield from cls.make_tree(path,
                                         parent=displayable_root,
                                         is_last=is_last,
                                         criteria=criteria)
            else:
                yield cls(path, displayable_root, is_last)
            count += 1

    @classmethod
    def _default_criteria(cls, path):
        return True

    @property
    def displayname(self):
        if self.path.is_dir():
            return self.path.name + '/'
        return self.path.name

    def displayable(self):
        if self.parent is None:
            return self.displayname

        _filename_prefix = (self.display_filename_prefix_last
                            if self.is_last
                            else self.display_filename_prefix_middle)

        parts = ['{!s} {!s}'.format(_filename_prefix,
                                    self.displayname)]

        parent = self.parent
        while parent and parent.parent is not None:
            parts.append(self.display_parent_prefix_middle
                         if parent.is_last
                         else self.display_parent_prefix_last)
            parent = parent.parent

        return ''.join(reversed(parts))

Example usage:

paths = DisplayablePath.make_tree(Path('doc'))
for path in paths:
    print(path.displayable())

Example output:

doc/
+-- _static/
¦   +-- embedded/
¦   ¦   +-- deep_file
¦   ¦   +-- very/
¦   ¦       +-- deep/
¦   ¦           +-- folder/
¦   ¦               +-- very_deep_file
¦   +-- less_deep_file
+-- about.rst
+-- conf.py
+-- index.rst

Notes

  • This uses recursion. It will raise a RecursionError on really deep folder trees
  • The tree is lazily evaluated. It should behave well on really wide folder trees. Immediate children of a given folder are not lazily evaluated, though.

Edit:

  • Added bonus! criteria callback for filtering paths.

There is a package (I created) called seedir for doing this and other things with folder tree diagrams:

>>> import seedir as sd
>>> sd.seedir('/path/to/some/path/or/package', style='emoji')

 package/
+- __init__.py
+- subpackage1/
¦ +- __init__.py
¦ +- moduleX.py
¦ +- moduleY.py
+- subpackage2/
¦ +- __init__.py
¦ +- moduleZ.py
+- moduleA.py

Something similar to the style OP used can be done with:

>>> sd.seedir('/path/to/folder', style='spaces', indent=4, anystart='- ')

- package/
    - __init__.py
    - subpackage1/
        - __init__.py
        - moduleX.py
        - moduleY.py
    - subpackage2/
        - __init__.py
        - moduleZ.py
    - moduleA.py

@dhobbs's answer is great!

but simply change to easy get the level info

def print_list_dir(dir):
    print("=" * 64)
    print("[PRINT LIST DIR] %s" % dir)
    print("=" * 64)
    for root, dirs, files in os.walk(dir):
        level = root.replace(dir, '').count(os.sep)
        indent = '| ' * level
        print('{}{} \\'.format(indent, os.path.basename(root)))
        subindent = '| ' * (level + 1)
        for f in files:
            print('{}{}'.format(subindent, f))
    print("=" * 64)

and the output like

================================================================
[PRINT LIST DIR] ./
================================================================
 \
| os_name.py
| json_loads.py
| linspace_python.py
| list_file.py
| to_gson_format.py
| type_convert_test.py
| in_and_replace_test.py
| online_log.py
| padding_and_clipping.py
| str_tuple.py
| set_test.py
| script_name.py
| word_count.py
| get14.py
| np_test2.py
================================================================

you can get the level by | count!


import os

def fs_tree_to_dict(path_):
    file_token = ''
    for root, dirs, files in os.walk(path_):
        tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs}
        tree.update({f: file_token for f in files})
        return tree  # note we discontinue iteration trough os.walk

If anybody is interested - that recursive function returns nested structure of dictionaries. Keys are file system names (of directories and files), values are either:

  • sub dictionaries for directories
  • strings for files (see file_token)

The strings designating files are empty in this example. They can also be e.g. given file contents or its owner info or privileges or whatever object different than a dict. Unless it's a dictionary it can be easily distinguished from a "directory type" in further operations.

Having such a tree in a filesystem:

# bash:
$ tree /tmp/ex
/tmp/ex
+-- d_a
¦   +-- d_a_a
¦   +-- d_a_b
¦   ¦   +-- f1.txt
¦   +-- d_a_c
¦   +-- fa.txt
+-- d_b
¦   +-- fb1.txt
¦   +-- fb2.txt
+-- d_c

The result will be:

# python 2 or 3:
>>> fs_tree_to_dict("/tmp/ex")
{
    'd_a': {
        'd_a_a': {},
        'd_a_b': {
            'f1.txt': ''
        },
        'd_a_c': {},
        'fa.txt': ''
    },
    'd_b': {
        'fb1.txt': '',
        'fb2.txt': ''
    },
    'd_c': {}
}

If you like that, I've already created a package (python 2 & 3) with this stuff (and a nice pyfakefs helper): https://pypi.org/project/fsforge/


You can execute 'tree' command of Linux shell.

Installation:

   ~$sudo apt install tree

Using in python

    >>> import os
    >>> os.system('tree <desired path>')

Example:

    >>> os.system('tree ~/Desktop/myproject')

This gives you a cleaner structure and is visually more comprehensive and easy to type.


On top of dhobbs answer above (https://stackoverflow.com/a/9728478/624597), here is an extra functionality of storing results to a file (I personally use it to copy and paste to FreeMind to have a nice overview of the structure, therefore I used tabs instead of spaces for indentation):

import os

def list_files(startpath):

    with open("folder_structure.txt", "w") as f_output:
        for root, dirs, files in os.walk(startpath):
            level = root.replace(startpath, '').count(os.sep)
            indent = '\t' * 1 * (level)
            output_string = '{}{}/'.format(indent, os.path.basename(root))
            print(output_string)
            f_output.write(output_string + '\n')
            subindent = '\t' * 1 * (level + 1)
            for f in files:
                output_string = '{}{}'.format(subindent, f)
                print(output_string)
                f_output.write(output_string + '\n')

list_files(".")