Check whether a path is valid in Python without creating a file at the path s target

Question

I have a path  including directory and file name   I need to test if the file-name is a valid  e g  if the file-system will allow me to create a file with such a name  The file-name has some unicode characters in it   It s safe to assume the directory segment of the path is valid and accessible  I was trying to make the question more gnerally applicable  and apparently I wen too far    I very much do not want to have to escape anything unless I have to    I d post some of the example characters I am dealing with  but apparently they get automatically removed by the stack-exchange system  Anyways  I want to keep standard unicode entities like     and only escape things which are invalid in a filename     Here is the catch  There may  or may not  already be a file at the target of the path  I need to keep that file if it does exist  and not create a file if it does not   Basically I want to check if I could write to a path without actually opening the path for writing  and the automatic file creation file clobbering that typically entails    As such   try      open filename   w   except OSError        handle error here   from here  Is not acceptable  because it will overwrite the existent file  which I do not want to touch  if it s there   or create said file if it s not   I know I can do   if not os access filePath  os W OK       try          open filePath   w   close           os unlink filePath      except OSError            handle error here   But that will create the file at the filePath  which I would then have to os unlink   In the end  it seems like it s spending 6 or 7 lines to do something that should be as simple as os isvalidpath filePath  or similar     As an aside  I need this to run on  at least  Windows and MacOS  so I d like to avoid platform-specific stuff

User · Answer

tl dr  Call the is path exists or creatable   function defined below   Strictly Python 3  That s just how we roll   A Tale of Two Questions  The question of  How do I test pathname validity and  for valid pathnames  the existence or writability of those paths   is clearly two separate questions  Both are interesting  and neither have received a genuinely satisfactory answer here    or  well  anywhere that I could grep   vikki s answer probably hews the closest  but has the remarkable disadvantages of    Needlessly opening     and then failing to reliably close  file handles  Needlessly writing     and then failing to reliable close or delete  0-byte files  Ignoring OS-specific errors differentiating between non-ignorable invalid pathnames and ignorable filesystem issues  Unsurprisingly  this is critical under Windows   See below   Ignoring race conditions resulting from external processes concurrently  re moving parent directories of the pathname to be tested   See below   Ignoring connection timeouts resulting from this pathname residing on stale  slow  or otherwise temporarily inaccessible filesystems  This could expose public-facing services to potential DoS-driven attacks   See below     We re gonna fix all that   Question  0  What s Pathname Validity Again   Before hurling our fragile meat suits into the python-riddled moshpits of pain  we should probably define what we mean by  pathname validity   What defines validity  exactly   By  pathname validity   we mean the syntactic correctness of a pathname with respect to the root filesystem of the current system     regardless of whether that path or parent directories thereof physically exist  A pathname is syntactically correct under this definition if it complies with all syntactic requirements of the root filesystem   By  root filesystem   we mean    On POSIX-compatible systems  the filesystem mounted to the root directory      On Windows  the filesystem mounted to  HOMEDRIVE   the colon-suffixed drive letter containing the current Windows installation  typically but not necessarily C      The meaning of  syntactic correctness   in turn  depends on the type of root filesystem  For ext4  and most but not all POSIX-compatible  filesystems  a pathname is syntactically correct if and only if that pathname    Contains no null bytes  i e    x00 in Python   This is a hard requirement for all POSIX-compatible filesystems  Contains no path components longer than 255 bytes  e g    a  256 in Python   A path component is a longest substring of a pathname containing no   character  e g   bergtatt  ind  i  and fjeldkamrene in the pathname  bergtatt ind i fjeldkamrene     Syntactic correctness  Root filesystem  That s it   Question  1  How Now Shall We Do Pathname Validity   Validating pathnames in Python is surprisingly non-intuitive  I m in firm agreement with Fake Name here  the official os path package should provide an out-of-the-box solution for this  For unknown  and probably uncompelling  reasons  it doesn t  Fortunately  unrolling your own ad-hoc solution isn t that gut-wrenching     O K   it actually is  It s hairy  it s nasty  it probably chortles as it burbles and giggles as it glows  But what you gonna do  Nuthin    We ll soon descend into the radioactive abyss of low-level code  But first  let s talk high-level shop  The standard os stat   and os lstat   functions raise the following exceptions when passed invalid pathnames    For pathnames residing in non-existing directories  instances of FileNotFoundError  For pathnames residing in existing directories    Under Windows  instances of WindowsError whose winerror attribute is 123  i e   ERROR INVALID NAME   Under all other OSes  For pathnames containing null bytes  i e     x00    instances of TypeError  For pathnames containing path components longer than 255 bytes  instances of OSError whose errcode attribute is    Under SunOS and the  BSD family of OSes  errno ERANGE   This appears to be an OS-level bug  otherwise referred to as  selective interpretation  of the POSIX standard   Under all other OSes  errno ENAMETOOLONG      Crucially  this implies that only pathnames residing in existing directories are validatable  The os stat   and os lstat   functions raise generic FileNotFoundError exceptions when passed pathnames residing in non-existing directories  regardless of whether those pathnames are invalid or not  Directory existence takes precedence over pathname invalidity   Does this mean that pathnames residing in non-existing directories are not validatable  Yes     unless we modify those pathnames to reside in existing directories  Is that even safely feasible  however  Shouldn t modifying a pathname prevent us from validating the original pathname   To answer this question  recall from above that syntactically correct pathnames on the ext4 filesystem contain no path components  A  containing null bytes or  B  over 255 bytes in length  Hence  an ext4 pathname is valid if and only if all path components in that pathname are valid  This is true of most real-world filesystems of interest   Does that pedantic insight actually help us  Yes  It reduces the larger problem of validating the full pathname in one fell swoop to the smaller problem of only validating all path components in that pathname  Any arbitrary pathname is validatable  regardless of whether that pathname resides in an existing directory or not  in a cross-platform manner by following the following algorithm    Split that pathname into path components  e g   the pathname  troldskog faren vild into the list       troldskog    faren    vild     For each such component    Join the pathname of a directory guaranteed to exist with that component into a new temporary pathname  e g    troldskog    Pass that pathname to os stat   or os lstat    If that pathname and hence that component is invalid  this call is guaranteed to raise an exception exposing the type of invalidity rather than a generic FileNotFoundError exception  Why  Because that pathname resides in an existing directory   Circular logic is circular      Is there a directory guaranteed to exist  Yes  but typically only one  the topmost directory of the root filesystem  as defined above    Passing pathnames residing in any other directory  and hence not guaranteed to exist  to os stat   or os lstat   invites race conditions  even if that directory was previously tested to exist  Why  Because external processes cannot be prevented from concurrently removing that directory after that test has been performed but before that pathname is passed to os stat   or os lstat    Unleash the dogs of mind-fellating insanity   There exists a substantial side benefit to the above approach as well  security   Isn t that nice   Specifically      Front-facing applications validating arbitrary pathnames from untrusted sources by simply passing such pathnames to os stat   or os lstat   are susceptible to Denial of Service  DoS  attacks and other black-hat shenanigans  Malicious users may attempt to repeatedly validate pathnames residing on filesystems known to be stale or otherwise slow  e g   NFS Samba shares   in that case  blindly statting incoming pathnames is liable to either eventually fail with connection timeouts or consume more time and resources than your feeble capacity to withstand unemployment    The above approach obviates this by only validating the path components of a pathname against the root directory of the root filesystem   If even that s stale  slow  or inaccessible  you ve got larger problems than pathname validation    Lost  Great  Let s begin   Python 3 assumed  See  What Is Fragile Hope for 300  leycec     import errno  os    Sadly  Python fails to provide the following magic number for us  ERROR INVALID NAME   123     Windows-specific error code indicating an invalid pathname   See Also ---------- https   docs microsoft com en-us windows win32 debug system-error-codes--0-499-     Official listing of all such codes       def is pathname valid pathname  str  - gt  bool               True  if the passed pathname is a valid pathname for the current OS       False  otherwise                If this pathname is either not a string or is but is empty  this pathname       is invalid      try          if not isinstance pathname  str  or not pathname              return False            Strip this pathname s Windows-specific drive specifier  e g    C               if any  Since Windows prohibits path components from containing               characters  failing to strip this    -suffixed prefix would           erroneously invalidate all valid absolute Windows pathnames             pathname   os path splitdrive pathname             Directory guaranteed to exist  If the current OS is Windows  this is           the drive to which Windows was installed  e g   the   HOMEDRIVE             environment variable   else  the typical root directory          root dirname   os environ get  HOMEDRIVE    C                  if sys platform     win32  else os path sep         assert os path isdir root dirname         Murphy and her ironclad Law            Append a path separator to this directory if needed          root dirname   root dirname rstrip os path sep    os path sep            Test whether each path component split from this pathname is valid or           not  ignoring non-existent and non-readable path components          for pathname part in pathname split os path sep               try                  os lstat root dirname   pathname part                If an OS-specific exception is raised  its error code               indicates whether this pathname is valid or not  Unless this               is the case  this exception implies an ignorable kernel or               filesystem complaint  e g   path not found or inaccessible                               Only the following exceptions indicate invalid pathnames                                Instances of the Windows-specific  WindowsError  class                 defining the  winerror  attribute whose value is                  ERROR INVALID NAME   Under Windows   winerror  is more                 fine-grained and hence useful than the generic  errno                  attribute  When a too-long pathname is passed  for example                   errno  is  ENOENT   i e   no such file or directory  rather                 than  ENAMETOOLONG   i e   file name too long                   Instances of the cross-platform  OSError  class defining the                 generic  errno  attribute whose value is either                    Under most POSIX-compatible OSes   ENAMETOOLONG                     Under some edge-case OSes  e g   SunOS   BSD    ERANGE               except OSError as exc                  if hasattr exc   winerror                        if exc winerror    ERROR INVALID NAME                          return False                 elif exc errno in  errno ENAMETOOLONG  errno ERANGE                       return False       If a  TypeError  exception was raised  it almost certainly has the       error message  embedded NUL character  indicating an invalid pathname      except TypeError as exc          return False       If no exception was raised  all path components and hence this       pathname itself are valid   Praise be to the curmudgeonly python       else          return True       If any other exception was raised  this is an unrelated fatal issue        e g   a bug   Permit this exception to unwind the call stack              Did we mention this should be shipped with Python already    Done  Don t squint at that code   It bites    Question  2  Possibly Invalid Pathname Existence or Creatability  Eh   Testing the existence or creatability of possibly invalid pathnames is  given the above solution  mostly trivial  The little key here is to call the previously defined function before testing the passed path   def is path creatable pathname  str  - gt  bool               True  if the current user has sufficient permissions to create the passed     pathname   False  otherwise                Parent directory of the passed path  If empty  we substitute the current       working directory  CWD  instead      dirname   os path dirname pathname  or os getcwd       return os access dirname  os W OK   def is path exists or creatable pathname  str  - gt  bool               True  if the passed pathname is a valid pathname for the current OS  and      either currently exists or is hypothetically creatable   False  otherwise       This function is guaranteed to  never  raise exceptions              try            To prevent  os  module calls from raising undesirable exceptions on           invalid pathnames  is pathname valid   is explicitly called first          return is pathname valid pathname  and               os path exists pathname  or is path creatable pathname         Report failure on non-fatal filesystem complaints  e g   connection       timeouts  permissions issues  implying this path to be inaccessible  All       other exceptions are unrelated fatal issues and should not be caught here      except OSError          return False   Done and done  Except not quite   Question  3  Possibly Invalid Pathname Existence or Writability on Windows  There exists a caveat  Of course there does   As the official os access   documentation admits      Note  I O operations may fail even when os access   indicates that they would succeed  particularly for operations on network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model    To no one s surprise  Windows is the usual suspect here  Thanks to extensive use of Access Control Lists  ACL  on NTFS filesystems  the simplistic POSIX permission-bit model maps poorly to the underlying Windows reality  While this  arguably  isn t Python s fault  it might nonetheless be of concern for Windows-compatible applications   If this is you  a more robust alternative is wanted  If the passed path does not exist  we instead attempt to create a temporary file guaranteed to be immediately deleted in the parent directory of that path     a more portable  if expensive  test of creatability   import os  tempfile  def is path sibling creatable pathname  str  - gt  bool               True  if the current user has sufficient permissions to create   siblings        i e   arbitrary files in the parent directory  of the passed pathname       False  otherwise                Parent directory of the passed path  If empty  we substitute the current       working directory  CWD  instead      dirname   os path dirname pathname  or os getcwd        try            For safety  explicitly close and hence delete this temporary file           immediately after creating it in the passed path s parent directory          with tempfile TemporaryFile dir dirname   pass         return True       While the exact type of exception raised by the above function depends on       the current version of the Python interpreter  all such types subclass the       following exception superclass      except EnvironmentError          return False  def is path exists or creatable portable pathname  str  - gt  bool               True  if the passed pathname is a valid pathname on the current OS  and      either currently exists or is hypothetically creatable in a cross-platform     manner optimized for POSIX-unfriendly filesystems   False  otherwise       This function is guaranteed to  never  raise exceptions              try            To prevent  os  module calls from raising undesirable exceptions on           invalid pathnames  is pathname valid   is explicitly called first          return is pathname valid pathname  and               os path exists pathname  or is path sibling creatable pathname         Report failure on non-fatal filesystem complaints  e g   connection       timeouts  permissions issues  implying this path to be inaccessible  All       other exceptions are unrelated fatal issues and should not be caught here      except OSError          return False   Note  however  that even this may not be enough   Thanks to User Access Control  UAC   the ever-inimicable Windows Vista and all subsequent iterations thereof blatantly lie about permissions pertaining to system directories  When non-Administrator users attempt to create files in either the canonical C  Windows or C  Windows system32 directories  UAC superficially permits the user to do so while actually isolating all created files into a  Virtual Store  in that user s profile   Who could have possibly imagined that deceiving users would have harmful long-term consequences    This is crazy  This is Windows   Prove It  Dare we  It s time to test-drive the above tests   Since NULL is the only character prohibited in pathnames on UNIX-oriented filesystems  let s leverage that to demonstrate the cold  hard truth     ignoring non-ignorable Windows shenanigans  which frankly bore and anger me in equal measure    gt  gt  gt  print   foo bar  valid      str is pathname valid  foo bar      foo bar  valid  True  gt  gt  gt  print  Null byte valid      str is pathname valid   x00     Null byte valid  False  gt  gt  gt  print  Long path valid      str is pathname valid  a    256    Long path valid  False  gt  gt  gt  print    dev  exists or creatable      str is path exists or creatable   dev       dev  exists or creatable  True  gt  gt  gt  print    dev foo bar  exists or creatable      str is path exists or creatable   dev foo bar       dev foo bar  exists or creatable  False  gt  gt  gt  print  Null byte exists or creatable      str is path exists or creatable   x00     Null byte exists or creatable  False   Beyond sanity  Beyond pain  You will find Python portability concerns

User · Answer

try os path exists this will check for the path and return True if exists and False if not

User · Answer

open filename  r      2nd argument is r and not w   will open the file or give an error if it doesn t exist  If there s an error  then you can try to write to the path  if you can t then you get a second error  try      open filename  r       return True except IOError      try          open filename   w           return True     except IOError          return False   Also have a look here about permissions on windows

User · Answer

With Python 3  how about   try      with open filename   x   as tempfile    OSError if file exists or is invalid         pass except OSError        handle error here   With the  x  option we also don t have to worry about race conditions  See documentation here   Now  this WILL create a very shortlived temporary file if it does not exist already - unless the name is invalid  If you can live with that  it simplifies things a lot

User · Answer

if os path exists filePath        the file is there elif os access os path dirname filePath   os W OK        the file does not exists but write privileges are given else       can not write there   Note that path exists can fail for more reasons than just the file is not there so you might have to do finer tests like testing if the containing directory exists and so on     After my discussion with the OP it turned out  that the main problem seems to be  that the file name might contain characters that are not allowed by the filesystem  Of course they need to be removed but the OP wants to maintain as much human readablitiy as the filesystem allows    Sadly I do not know of any good solution for this  However Cecil Curry s answer takes a closer look at detecting the problem

[python] Check whether a path is valid in Python without creating a file at the path's target

tl;dr

A Tale of Two Questions

Question #0: What's Pathname Validity Again?

Question #1: How Now Shall We Do Pathname Validity?

Question #2: Possibly Invalid Pathname Existence or Creatability, Eh?

Question #3: Possibly Invalid Pathname Existence or Writability on Windows

Prove It

Examples related to python

Examples related to filesystems

Examples related to filepath