[python] How to split a dos path into its components in Python

I have a string variable which represents a dos path e.g:

var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

I want to split this string into:

[ "d", "stuff", "morestuff", "furtherdown", "THEFILE.txt" ]

I have tried using split() and replace() but they either only process the first backslash or they insert hex numbers into the string.

I need to convert this string variable into a raw string somehow so that I can parse it.

What's the best way to do this?

I should also add that the contents of var i.e. the path that I'm trying to parse, is actually the return value of a command line query. It's not path data that I generate myself. Its stored in a file, and the command line tool is not going to escape the backslashes.

This question is related to python

The answer is


You can simply use the most Pythonic approach (IMHO):

import os

your_path = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
path_list = your_path.split(os.sep)
print path_list

Which will give you:

['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

The clue here is to use os.sep instead of '\\' or '/', as this makes it system independent.

To remove colon from the drive letter (although I don't see any reason why you would want to do that), you can write:

path_list[0] = path_list[0][0]

For a somewhat more concise solution, consider the following:

def split_path(p):
    a,b = os.path.split(p)
    return (split_path(a) if len(a) and len(b) else []) + [b]

The functional way, with a generator.

def split(path):
    (drive, head) = os.path.splitdrive(path)
    while (head != os.sep):
        (head, tail) = os.path.split(head)
        yield tail

In action:

>>> print([x for x in split(os.path.normpath('/path/to/filename'))])
['filename', 'to', 'path']

use ntpath.split()


The stuff about about mypath.split("\\") would be better expressed as mypath.split(os.sep). sep is the path separator for your particular platform (e.g., \ for Windows, / for Unix, etc.), and the Python build knows which one to use. If you use sep, then your code will be platform agnostic.


In Python >=3.4 this has become much simpler. You can now use pathlib.Path.parts to get all the parts of a path.

Example:

>>> from pathlib import Path
>>> Path('C:/path/to/file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> Path(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')

On a Windows install of Python 3 this will assume that you are working with Windows paths, and on *nix it will assume that you are working with posix paths. This is usually what you want, but if it isn't you can use the classes pathlib.PurePosixPath or pathlib.PureWindowsPath as needed:

>>> from pathlib import PurePosixPath, PureWindowsPath
>>> PurePosixPath('/path/to/file.txt').parts
('/', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'\\host\share\path\to\file.txt').parts
('\\\\host\\share\\', 'path', 'to', 'file.txt')

Edit: There is also a backport to python 2 available: pathlib2


Below line of code can handle:

  1. C:/path/path
  2. C://path//path
  3. C:\path\path
  4. C:\path\path

path = re.split(r'[///\]', path)


Adapted the solution of @Mike Robins avoiding empty path elements at the beginning:

def parts(path):
    p,f = os.path.split(os.path.normpath(path))
    return parts(p) + [f] if f and p else [p] if p else []

os.path.normpath() is actually required only once and could be done in a separate entry function to the recursion.


re.split() can help a little more then string.split()

import re    
var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"
re.split( r'[\\/]', var )
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

If you also want to support Linux and Mac paths, just add filter(None,result), so it will remove the unwanted '' from the split() since their paths starts with '/' or '//'. for example '//mount/...' or '/var/tmp/'

import re    
var = "/var/stuff/morestuff/furtherdown/THEFILE.txt"
result = re.split( r'[\\/]', var )
filter( None, result )
['var', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

It works for me:

>>> a=r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
>>> a.split("\\")
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

Sure you might need to also strip out the colon from the first component, but keeping it makes it possible to re-assemble the path.

The r modifier marks the string literal as "raw"; notice how embedded backslashes are not doubled.


I'm not actually sure if this fully answers the question, but I had a fun time writing this little function that keeps a stack, sticks to os.path-based manipulations, and returns the list/stack of items.

def components(path):
    ret = []
    while len(path) > 0:
        path, crust = split(path)
        ret.insert(0, crust)
    return ret

You can recursively os.path.split the string

import os
def parts(path):
    p,f = os.path.split(path)
    return parts(p) + [f] if f else [p]

Testing this against some path strings, and reassembling the path with os.path.join

>>> for path in [
...         r'd:\stuff\morestuff\furtherdown\THEFILE.txt',
...         '/path/to/file.txt',
...         'relative/path/to/file.txt',
...         r'C:\path\to\file.txt',
...         r'\\host\share\path\to\file.txt',
...     ]:
...     print parts(path), os.path.join(*parts(path))
... 
['d:\\', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] d:\stuff\morestuff\furtherdown\THEFILE.txt
['/', 'path', 'to', 'file.txt'] /path\to\file.txt
['', 'relative', 'path', 'to', 'file.txt'] relative\path\to\file.txt
['C:\\', 'path', 'to', 'file.txt'] C:\path\to\file.txt
['\\\\', 'host', 'share', 'path', 'to', 'file.txt'] \\host\share\path\to\file.txt

The first element of the list may need to be treated differently depending on how you want to deal with drive letters, UNC paths and absolute and relative paths. Changing the last [p] to [os.path.splitdrive(p)] forces the issue by splitting the drive letter and directory root out into a tuple.

import os
def parts(path):
    p,f = os.path.split(path)
    return parts(p) + [f] if f else [os.path.splitdrive(p)]

[('d:', '\\'), 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
[('', '/'), 'path', 'to', 'file.txt']
[('', ''), 'relative', 'path', 'to', 'file.txt']
[('C:', '\\'), 'path', 'to', 'file.txt']
[('', '\\\\'), 'host', 'share', 'path', 'to', 'file.txt']

Edit: I have realised that this answer is very similar to that given above by user1556435. I'm leaving my answer up as the handling of the drive component of the path is different.


Just like others explained - your problem stemmed from using \, which is escape character in string literal/constant. OTOH, if you had that file path string from another source (read from file, console or returned by os function) - there wouldn't have been problem splitting on '\\' or r'\'.

And just like others suggested, if you want to use \ in program literal, you have to either duplicate it \\ or the whole literal has to be prefixed by r, like so r'lite\ral' or r"lite\ral" to avoid the parser converting that \ and r to CR (carriage return) character.

There is one more way though - just don't use backslash \ pathnames in your code! Since last century Windows recognizes and works fine with pathnames which use forward slash as directory separator /! Somehow not many people know that.. but it works:

>>> var = "d:/stuff/morestuff/furtherdown/THEFILE.txt"
>>> var.split('/')
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']

This by the way will make your code work on Unix, Windows and Mac... because all of them do use / as directory separator... even if you don't want to use the predefined constants of module os.


I use the following as since it uses the os.path.basename function it doesn't add any slashes to the returned list. It also works with any platform's slashes: i.e window's \\\\ or unix's /. And furthermore, it doesn't add the \\\\\\\\ that windows uses for server paths :)

def SplitPath( split_path ):
    pathSplit_lst   = []
    while os.path.basename(split_path):
        pathSplit_lst.append( os.path.basename(split_path) )
        split_path = os.path.dirname(split_path)
    pathSplit_lst.reverse()
    return pathSplit_lst

So for:

\\\\\\\server\\\\folder1\\\\folder2\\\\folder3\\\\folder4

You get:

['server','folder1','folder2','folder3','folder4']

I can't actually contribute a real answer to this one (as I came here hoping to find one myself), but to me the number of differing approaches and all the caveats mentioned is the surest indicator that Python's os.path module desperately needs this as a built-in function.


The problem here starts with how you're creating the string in the first place.

a = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

Done this way, Python is trying to special case these: \s, \m, \f, and \T. In your case, \f is being treated as a formfeed (0x0C) while the other backslashes are handled correctly. What you need to do is one of these:

b = "d:\\stuff\\morestuff\\furtherdown\\THEFILE.txt"      # doubled backslashes
c = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"         # raw string, no doubling necessary

Then once you split either of these, you'll get the result you want.


Let assume you have have a file filedata.txt with content:

d:\stuff\morestuff\furtherdown\THEFILE.txt
d:\otherstuff\something\otherfile.txt

You can read and split the file paths:

>>> for i in open("filedata.txt").readlines():
...     print i.strip().split("\\")
... 
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
['d:', 'otherstuff', 'something', 'otherfile.txt']

I would do

import os
path = os.path.normpath(path)
path.split(os.sep)

First normalize the path string into a proper string for the OS. Then os.sep must be safe to use as a delimiter in string function split.


One recursive for the fun.

Not the most elegant answer, but should work everywhere:

import os

def split_path(path):
    head = os.path.dirname(path)
    tail = os.path.basename(path)
    if head == os.path.dirname(head):
        return [tail]
    return split_path(head) + [tail]