[windows] What characters are forbidden in Windows and Linux directory names?

I know that / is illegal in Linux, and the following are illegal in Windows (I think) * . " / \ [ ] : ; | ,

What else am I missing?

I need a comprehensive guide, however, and one that takes into account double-byte characters. Linking to outside resources is fine with me.

I need to first create a directory on the filesystem using a name that may contain forbidden characters, so I plan to replace those characters with underscores. I then need to write this directory and its contents to a zip file (using Java), so any additional advice concerning the names of zip directories would be appreciated.

This question is related to windows linux directory zip filenames

The answer is


When creating internet shortcuts in Windows, to create the file name, it skips illegal characters, except for forward slash, which is converted to minus.


In Unix shells, you can quote almost every character in single quotes '. Except the single quote itself, and you can't express control characters, because \ is not expanded. Accessing the single quote itself from within a quoted string is possible, because you can concatenate strings with single and double quotes, like 'I'"'"'m' which can be used to access a file called "I'm" (double quote also possible here).

So you should avoid all control characters, because they are too difficult to enter in the shell. The rest still is funny, especially files starting with a dash, because most commands read those as options unless you have two dashes -- before, or you specify them with ./, which also hides the starting -.

If you want to be nice, don't use any of the characters the shell and typical commands use as syntactical elements, sometimes position dependent, so e.g. you can still use -, but not as first character; same with ., you can use it as first character only when you mean it ("hidden file"). When you are mean, your file names are VT100 escape sequences ;-), so that an ls garbles the output.


Here's a c# implementation for windows based on Christopher Oezbek's answer

It was made more complex by the containsFolder boolean, but hopefully covers everything

/// <summary>
/// This will replace invalid chars with underscores, there are also some reserved words that it adds underscore to
/// </summary>
/// <remarks>
/// https://stackoverflow.com/questions/1976007/what-characters-are-forbidden-in-windows-and-linux-directory-names
/// </remarks>
/// <param name="containsFolder">Pass in true if filename represents a folder\file (passing true will allow slash)</param>
public static string EscapeFilename_Windows(string filename, bool containsFolder = false)
{
    StringBuilder builder = new StringBuilder(filename.Length + 12);

    int index = 0;

    // Allow colon if it's part of the drive letter
    if (containsFolder)
    {
        Match match = Regex.Match(filename, @"^\s*[A-Z]:\\", RegexOptions.IgnoreCase);
        if (match.Success)
        {
            builder.Append(match.Value);
            index = match.Length;
        }
    }

    // Character substitutions
    for (int cntr = index; cntr < filename.Length; cntr++)
    {
        char c = filename[cntr];

        switch (c)
        {
            case '\u0000':
            case '\u0001':
            case '\u0002':
            case '\u0003':
            case '\u0004':
            case '\u0005':
            case '\u0006':
            case '\u0007':
            case '\u0008':
            case '\u0009':
            case '\u000A':
            case '\u000B':
            case '\u000C':
            case '\u000D':
            case '\u000E':
            case '\u000F':
            case '\u0010':
            case '\u0011':
            case '\u0012':
            case '\u0013':
            case '\u0014':
            case '\u0015':
            case '\u0016':
            case '\u0017':
            case '\u0018':
            case '\u0019':
            case '\u001A':
            case '\u001B':
            case '\u001C':
            case '\u001D':
            case '\u001E':
            case '\u001F':

            case '<':
            case '>':
            case ':':
            case '"':
            case '/':
            case '|':
            case '?':
            case '*':
                builder.Append('_');
                break;

            case '\\':
                builder.Append(containsFolder ? c : '_');
                break;

            default:
                builder.Append(c);
                break;
        }
    }

    string built = builder.ToString();

    if (built == "")
    {
        return "_";
    }

    if (built.EndsWith(" ") || built.EndsWith("."))
    {
        built = built.Substring(0, built.Length - 1) + "_";
    }

    // These are reserved names, in either the folder or file name, but they are fine if following a dot
    // CON, PRN, AUX, NUL, COM0 .. COM9, LPT0 .. LPT9
    builder = new StringBuilder(built.Length + 12);
    index = 0;
    foreach (Match match in Regex.Matches(built, @"(^|\\)\s*(?<bad>CON|PRN|AUX|NUL|COM\d|LPT\d)\s*(\.|\\|$)", RegexOptions.IgnoreCase))
    {
        Group group = match.Groups["bad"];
        if (group.Index > index)
        {
            builder.Append(built.Substring(index, match.Index - index + 1));
        }

        builder.Append(group.Value);
        builder.Append("_");        // putting an underscore after this keyword is enough to make it acceptable

        index = group.Index + group.Length;
    }

    if (index == 0)
    {
        return built;
    }

    if (index < built.Length - 1)
    {
        builder.Append(built.Substring(index));
    }

    return builder.ToString();
}

For anyone looking for a regex:

const BLACKLIST = /[<>:"\/\\|?*]/g;

I always assumed that banned characters in Windows filenames meant that all exotic characters would also be outlawed. The inability to use ?, / and : in particular irked me. One day I discovered that it was virtually only those chars which were banned. Other Unicode characters may be used. So the nearest Unicode characters to the banned ones I could find were identified and MS Word macros were made for them as Alt-?, Alt-: etc. Now I form the filename in Word, using the substitute chars, and copy it to the Windows filename. So far I have had no problems.

Here are the substitute chars (Alt - the decimal Unicode) :

? Alt-8432
/ Alt-8260
? Alt-8421
| Alt-8739
? Alt-11622 ? Alt-11162
? Alt-8253 ? Alt 4961 " Alt-8246 " Alt-8243

As a test I formed a filename using all of those chars and Windows accepted it.


As of 18/04/2017, no simple black or white list of characters and filenames is evident among the answers to this topic - and there are many replies.

The best suggestion I could come up with was to let the user name the file however he likes. Using an error handler when the application tries to save the file, catch any exceptions, assume the filename is to blame (obviously after making sure the save path was ok as well), and prompt the user for a new file name. For best results, place this checking procedure within a loop that continues until either the user gets it right or gives up. Worked best for me (at least in VBA).


Well, if only for research purposes, then your best bet is to look at this Wikipedia entry on Filenames.

If you want to write a portable function to validate user input and create filenames based on that, the short answer is don't. Take a look at a portable module like Perl's File::Spec to have a glimpse to all the hops needed to accomplish such a "simple" task.


The .NET Framework System.IO provides the following functions for invalid file system characters:

Those functions should return appropriate results depending on the platform the .NET runtime is running in.


I had the same need and was looking for recommendation or standard references and came across this thread. My current blacklist of characters that should be avoided in file and directory names are:

$CharactersInvalidForFileName = {
    "pound" -> "#",
    "left angle bracket" -> "<",
    "dollar sign" -> "$",
    "plus sign" -> "+",
    "percent" -> "%",
    "right angle bracket" -> ">",
    "exclamation point" -> "!",
    "backtick" -> "`",
    "ampersand" -> "&",
    "asterisk" -> "*",
    "single quotes" -> "“",
    "pipe" -> "|",
    "left bracket" -> "{",
    "question mark" -> "?",
    "double quotes" -> "”",
    "equal sign" -> "=",
    "right bracket" -> "}",
    "forward slash" -> "/",
    "colon" -> ":",
    "back slash" -> "\\",
    "lank spaces" -> "b",
    "at sign" -> "@"
};

In Windows 10 (2019), the following characters are forbidden by an error when you try to type them:

A file name can't contain any of the following characters:

\ / : * ? " < > |


Instead of creating a blacklist of characters, you could use a whitelist. All things considered, the range of characters that make sense in a file or directory name context is quite short, and unless you have some very specific naming requirements your users will not hold it against your application if they cannot use the whole ASCII table.

It does not solve the problem of reserved names in the target file system, but with a whitelist it is easier to mitigate the risks at the source.

In that spirit, this is a range of characters that can be considered safe:

  • Letters (a-z A-Z) - Unicode characters as well, if needed
  • Digits (0-9)
  • Underscore (_)
  • Hyphen (-)
  • Space
  • Dot (.)

And any additional safe characters you wish to allow. Beyond this, you just have to enforce some additional rules regarding spaces and dots. This is usually sufficient:

  • Name must contain at least one letter or number (to avoid only dots/spaces)
  • Name must start with a letter or number (to avoid leading dots/spaces)
  • Name may not end with a dot or space (simply trim those if present, like Explorer does)

This already allows quite complex and nonsensical names. For example, these names would be possible with these rules, and be valid file names in Windows/Linux:

  • A...........ext
  • B -.- .ext

In essence, even with so few whitelisted characters you should still decide what actually makes sense, and validate/adjust the name accordingly. In one of my applications, I used the same rules as above but stripped any duplicate dots and spaces.


Though the only illegal Unix chars might be / and NULL, although some consideration for command line interpretation should be included.

For example, while it might be legal to name a file 1>&2 or 2>&1 in Unix, file names such as this might be misinterpreted when used on a command line.

Similarly it might be possible to name a file $PATH, but when trying to access it from the command line, the shell will translate $PATH to its variable value.


Under Linux and other Unix-related systems, there are only two characters that cannot appear in the name of a file or directory, and those are NUL '\0' and slash '/'. The slash, of course, can appear in a path name, separating directory components.

Rumour1 has it that Steven Bourne (of 'shell' fame) had a directory containing 254 files, one for every single letter (character code) that can appear in a file name (excluding /, '\0'; the name . was the current directory, of course). It was used to test the Bourne shell and routinely wrought havoc on unwary programs such as backup programs.

Other people have covered the Windows rules.

Note that MacOS X has a case-insensitive file system.


1 It was Kernighan & Pike in The Practice of Programming who said as much in Chapter 6, Testing, §6.5 Stress Tests:

When Steve Bourne was writing his Unix shell (which came to be known as the Bourne shell), he made a directory of 254 files with one-character names, one for each byte value except '\0' and slash, the two characters that cannot appear in Unix file names. He used that directory for all manner of tests of pattern-matching and tokenization. (The test directory was of course created by a program.) For years afterwards, that directory was the bane of file-tree-walking programs; it tested them to destruction.

Note that the directory must have contained entries . and .., so it was arguably 253 files (and 2 directories), or 255 name entries, rather than 254 files. This doesn't affect the effectiveness of the anecdote, or the careful testing it describes.


The easy way to get Windows to tell you the answer is to attempt to rename a file via Explorer and type in / for the new name. Windows will popup a message box telling you the list of illegal characters.

A filename cannot contain any of the following characters:
    \ / : * ? " < > | 

https://support.microsoft.com/en-us/kb/177506


Let's keep it simple and answer the question, first.

  1. The forbidden printable ASCII characters are:

    • Linux/Unix:

      / (forward slash)
      
    • Windows:

      < (less than)
      > (greater than)
      : (colon - sometimes works, but is actually NTFS Alternate Data Streams)
      " (double quote)
      / (forward slash)
      \ (backslash)
      | (vertical bar or pipe)
      ? (question mark)
      * (asterisk)
      
  2. Non-printable characters

    If your data comes from a source that would permit non-printable characters then there is more to check for.

    • Linux/Unix:

      0 (NULL byte)
      
    • Windows:

      0-31 (ASCII control characters)
      

    Note: While it is legal under Linux/Unix file systems to create files with control characters in the filename, it might be a nightmare for the users to deal with such files.

  3. Reserved file names

    The following filenames are reserved:

    • Windows:

      CON, PRN, AUX, NUL 
      COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
      LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9
      

      (both on their own and with arbitrary file extensions, e.g. LPT1.txt).

  4. Other rules

    • Windows:

      Filenames cannot end in a space or dot.


Difficulties with defining, what's legal and not were already adressed and whitelists were suggested. But Windows supports more-than-8-bit characters. Wikipedia states, that (for example) the

modifier letter colon [(See 7. below) is] sometimes used in Windows filenames as it is identical to the colon in the Segoe UI font used for filenames. The [inherited ASCII] colon itself is not permitted.

Therefore, I want to present a much more liberal approach using Unicode characters to replace the "illegal" ones. I found the result in my comparable use-case by far more readable. Plus you can even restore the original content from the replacements. Possible choices and research are provided in the following list:

  1. Instead of * (U+002A * ASTERISK), you can use one of the many listed, for example U+2217 * (ASTERISK OPERATOR) or the Full Width Asterisk U+FF0A *
  2. Instead of ., you can use one of these, for example · U+22C5 dot operator
  3. Instead of ", you can use “ U+201C english leftdoublequotemark (Alternatives see here)
  4. Instead of / (/ SOLIDUS U+002F ), you can use / DIVISION SLASH U+2215 (others here)
  5. Instead of \ (\ U+005C Reverse solidus), you can use ? U+29F5 Reverse solidus operator (more)
  6. Instead of [ (U+005B Left square bracket) and ] (U+005D Right square bracket), you can use for example U+FF3B[ FULLWIDTH LEFT SQUARE BRACKET and U+FF3D ]FULLWIDTH RIGHT SQUARE BRACKET (from here, more possibilities here)
  7. Instead of :, you can use U+2236 : RATIO (for mathematical usage) or U+A789 ? MODIFIER LETTER COLON, (see colon (letter), sometimes used in Windows filenames as it is identical to the colon in the Segoe UI font used for filenames. The colon itself is not permitted) (See here)
  8. Instead of ;, you can use U+037E ; GREEK QUESTION MARK (see here)
  9. For |, there are some good substitutes such as: U+0964 ? DEVANAGARI DANDA, U+2223 | DIVIDES or U+01C0 | LATIN LETTER DENTAL CLICK (Wikipedia). Also the box drawing characters contain various other options.
  10. Instead of , (, U+002C COMMA), you can use for example ‚ U+201A SINGLE LOW-9 QUOTATION MARK (see here)
  11. For ? (U+003F ? QUESTION MARK), these are good candidates: U+FF1F ? FULLWIDTH QUESTION MARK or U+FE56 ? SMALL QUESTION MARK (from here, two more from Dingbats Block, search for "question")

For additional ideas, you can also look for example into this block. In Windows, these special characters should theoretically be able to be typed by using an alt-code, but I only found a solution to insert it in Microsoft Office in this Microsoft article using ALT + X. It can of course still be copied instead of typed.


For Windows you can check it using PowerShell

$PathInvalidChars = [System.IO.Path]::GetInvalidPathChars() #36 chars

To display UTF-8 codes you can convert

$enc = [system.Text.Encoding]::UTF8
$PathInvalidChars | foreach { $enc.GetBytes($_) }

$FileNameInvalidChars = [System.IO.Path]::GetInvalidFileNameChars() #41 chars

$FileOnlyInvalidChars = @(':', '*', '?', '\', '/') #5 chars - as a difference

Examples related to windows

"Permission Denied" trying to run Python on Windows 10 A fatal error occurred while creating a TLS client credential. The internal error state is 10013 How to install OpenJDK 11 on Windows? I can't install pyaudio on Windows? How to solve "error: Microsoft Visual C++ 14.0 is required."? git clone: Authentication failed for <URL> How to avoid the "Windows Defender SmartScreen prevented an unrecognized app from starting warning" XCOPY: Overwrite all without prompt in BATCH Laravel 5 show ErrorException file_put_contents failed to open stream: No such file or directory how to open Jupyter notebook in chrome on windows Tensorflow import error: No module named 'tensorflow'

Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to directory

Moving all files from one directory to another using Python What is the reason for the error message "System cannot find the path specified"? Get folder name of the file in Python How to rename a directory/folder on GitHub website? Change directory in Node.js command prompt Get the directory from a file path in java (android) python: get directory two levels up How to add 'libs' folder in Android Studio? How to create a directory using Ansible Troubleshooting misplaced .git directory (nothing to commit)

Examples related to zip

Install php-zip on php 5.6 on Ubuntu 7-Zip command to create and extract a password-protected ZIP file on Windows? How are zlib, gzip and zip related? What do they have in common and how are they different? Read a zipped file as a pandas DataFrame Installing PHP Zip Extension How can you zip or unzip from the script using ONLY Windows' built-in capabilities? Creating a ZIP archive in memory using System.IO.Compression how to zip a folder itself using java Read Content from Files which are inside Zip file Need to ZIP an entire directory using Node.js

Examples related to filenames

Rename multiple files in a folder, add a prefix (Windows) How to loop over files in directory and change path and add suffix to filename Why do I get a SyntaxError for a Unicode escape in my file path? Git copy file preserving history A html space is showing as %2520 instead of %20 How do I get the file name from a String containing the Absolute file path? DateTime.ToString() format that can be used in a filename or extension? Get only filename from url in php without any variable values which exist in the url Obtaining only the filename when using OpenFileDialog property "FileName" Build the full path filename in Python