For example, in a Windows folder, if we create some files and name them 1.html, 2.txt, 3.txt, photo.jpg, zen.png the order will be as is. But if we create another file with the name _file.doc it will be placed at the top. (considering we sort by name in descending order)
likewise, what would be the character that would be considered as the first, such that if i use that character, it would place the file on top of the hierarchy?
This question is related to
windows
algorithm
programming-languages
char
special-characters
TLDR; technically space sorts before exclamation mar, and can be used by preceding it with ' or - (which will be ignored in sorting), but exclamation mark follows right after space, and is easier to use.
On windows 7 at least, a minus sign (-) and (') seem to be ignored in a name except for one quirk: in a name that is otherwise identical, the ' will be sorted before -, for example: (a'a) will sort above (a-a)
Empty string will sort above everything else, which means for example aa will sort above aaa because the 'empty string' after two a letters will sort before the third 'a'.
This also means that aa will be sorted above a'a because the 'empty string' between two a letters will sort above the ' mark.
What follows then is, ' alone will sort first, because technically it's an empty string. However adding for example letters behind it will sort the name as if the ' didn't exist.
Since the first 'unignored' character (as far as I know) is space, in case you want to sort 'real names' above others, the best way to go would be ' followed by space, and then the name you want to actually use. For example: (' first)
You can of course top that by using more than one space in the strong, such as (' firster) and (' firstest) with two and three blanks before the f.
While minus sign sorts below ' in otherwise similar name, there's no other difference in sorting (that I know of), and I find minus sign visually clearer, so if I want to put something on top of list, I'd use minus followed by space, then the 'actual name', for example: (- first file -)
If you are worried about using space on the filename, then exclamation mark (!) is the next best thing - and since it can appear as first character on a string, it's easier to use.
I know there is already an answer - and this is an old question - but I was wondering the same thing and after finding this answer I did a little experimentation on my own and had (IMO) a worthwhile addition to the discussion.
The non-visible characters can still be used in a folder name - a placeholder is inserted - but the sort on ASCII value still seems to hold.
I tested on Windows7, holding down the alt-key and typing in the ASCII code using the numeric keypad. I did not test very many, but was successful creating foldernames that started with ASCII 1, ASCII 2, and ASCII 3. Those correspond with SOH, STX and ETX. Respectively it displayed happy face, filled happy face, and filled heart.
I'm not sure if I can duplicate that here - but I will type them in on the next lines and submit.
?foldername
?foldername
?foldername
If you google for sort order windows explorer you will find out that Windows Explorer (since Windows XP) obviously uses the function StrCmpLogicalW in the sort order "by name". I did not find information about the treatment of the underscore character. I was amused by the following note in the documentation:
Behavior of this function, and therefore the results it returns, can change from release to release. ...
Only a few characters in the Windows code page 1252 (Latin-1) are not allowed as names. Note that the Windows Explorer will strip leading spaces from names and not allow you to call a files space dot something (like ?.txt
), although this is allowed in the file system! Only a space and no file extension is invalid however.
If you create files through e.g. a Python script (this is what I did), then you can easily find out what is actually allowed and in what order the characters get sorted. The sort order varies based on your locale! Below are the results of my script, run with Python 2.7.15 on a German Windows 10 Pro 64bit:
Allowed:
32 20 SPACE
! 33 21 EXCLAMATION MARK
# 35 23 NUMBER SIGN
$ 36 24 DOLLAR SIGN
% 37 25 PERCENT SIGN
& 38 26 AMPERSAND
' 39 27 APOSTROPHE
( 40 28 LEFT PARENTHESIS
) 41 29 RIGHT PARENTHESIS
+ 43 2B PLUS SIGN
, 44 2C COMMA
- 45 2D HYPHEN-MINUS
. 46 2E FULL STOP
/ 47 2F SOLIDUS
0 48 30 DIGIT ZERO
1 49 31 DIGIT ONE
2 50 32 DIGIT TWO
3 51 33 DIGIT THREE
4 52 34 DIGIT FOUR
5 53 35 DIGIT FIVE
6 54 36 DIGIT SIX
7 55 37 DIGIT SEVEN
8 56 38 DIGIT EIGHT
9 57 39 DIGIT NINE
; 59 3B SEMICOLON
= 61 3D EQUALS SIGN
@ 64 40 COMMERCIAL AT
A 65 41 LATIN CAPITAL LETTER A
B 66 42 LATIN CAPITAL LETTER B
C 67 43 LATIN CAPITAL LETTER C
D 68 44 LATIN CAPITAL LETTER D
E 69 45 LATIN CAPITAL LETTER E
F 70 46 LATIN CAPITAL LETTER F
G 71 47 LATIN CAPITAL LETTER G
H 72 48 LATIN CAPITAL LETTER H
I 73 49 LATIN CAPITAL LETTER I
J 74 4A LATIN CAPITAL LETTER J
K 75 4B LATIN CAPITAL LETTER K
L 76 4C LATIN CAPITAL LETTER L
M 77 4D LATIN CAPITAL LETTER M
N 78 4E LATIN CAPITAL LETTER N
O 79 4F LATIN CAPITAL LETTER O
P 80 50 LATIN CAPITAL LETTER P
Q 81 51 LATIN CAPITAL LETTER Q
R 82 52 LATIN CAPITAL LETTER R
S 83 53 LATIN CAPITAL LETTER S
T 84 54 LATIN CAPITAL LETTER T
U 85 55 LATIN CAPITAL LETTER U
V 86 56 LATIN CAPITAL LETTER V
W 87 57 LATIN CAPITAL LETTER W
X 88 58 LATIN CAPITAL LETTER X
Y 89 59 LATIN CAPITAL LETTER Y
Z 90 5A LATIN CAPITAL LETTER Z
[ 91 5B LEFT SQUARE BRACKET
\\ 92 5C REVERSE SOLIDUS
] 93 5D RIGHT SQUARE BRACKET
^ 94 5E CIRCUMFLEX ACCENT
_ 95 5F LOW LINE
` 96 60 GRAVE ACCENT
a 97 61 LATIN SMALL LETTER A
b 98 62 LATIN SMALL LETTER B
c 99 63 LATIN SMALL LETTER C
d 100 64 LATIN SMALL LETTER D
e 101 65 LATIN SMALL LETTER E
f 102 66 LATIN SMALL LETTER F
g 103 67 LATIN SMALL LETTER G
h 104 68 LATIN SMALL LETTER H
i 105 69 LATIN SMALL LETTER I
j 106 6A LATIN SMALL LETTER J
k 107 6B LATIN SMALL LETTER K
l 108 6C LATIN SMALL LETTER L
m 109 6D LATIN SMALL LETTER M
n 110 6E LATIN SMALL LETTER N
o 111 6F LATIN SMALL LETTER O
p 112 70 LATIN SMALL LETTER P
q 113 71 LATIN SMALL LETTER Q
r 114 72 LATIN SMALL LETTER R
s 115 73 LATIN SMALL LETTER S
t 116 74 LATIN SMALL LETTER T
u 117 75 LATIN SMALL LETTER U
v 118 76 LATIN SMALL LETTER V
w 119 77 LATIN SMALL LETTER W
x 120 78 LATIN SMALL LETTER X
y 121 79 LATIN SMALL LETTER Y
z 122 7A LATIN SMALL LETTER Z
{ 123 7B LEFT CURLY BRACKET
} 125 7D RIGHT CURLY BRACKET
~ 126 7E TILDE
\x7f 127 7F DELETE
\x80 128 80 EURO SIGN
\x81 129 81
\x82 130 82 SINGLE LOW-9 QUOTATION MARK
\x83 131 83 LATIN SMALL LETTER F WITH HOOK
\x84 132 84 DOUBLE LOW-9 QUOTATION MARK
\x85 133 85 HORIZONTAL ELLIPSIS
\x86 134 86 DAGGER
\x87 135 87 DOUBLE DAGGER
\x88 136 88 MODIFIER LETTER CIRCUMFLEX ACCENT
\x89 137 89 PER MILLE SIGN
\x8a 138 8A LATIN CAPITAL LETTER S WITH CARON
\x8b 139 8B SINGLE LEFT-POINTING ANGLE QUOTATION
\x8c 140 8C LATIN CAPITAL LIGATURE OE
\x8d 141 8D
\x8e 142 8E LATIN CAPITAL LETTER Z WITH CARON
\x8f 143 8F
\x90 144 90
\x91 145 91 LEFT SINGLE QUOTATION MARK
\x92 146 92 RIGHT SINGLE QUOTATION MARK
\x93 147 93 LEFT DOUBLE QUOTATION MARK
\x94 148 94 RIGHT DOUBLE QUOTATION MARK
\x95 149 95 BULLET
\x96 150 96 EN DASH
\x97 151 97 EM DASH
\x98 152 98 SMALL TILDE
\x99 153 99 TRADE MARK SIGN
\x9a 154 9A LATIN SMALL LETTER S WITH CARON
\x9b 155 9B SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
\x9c 156 9C LATIN SMALL LIGATURE OE
\x9d 157 9D
\x9e 158 9E LATIN SMALL LETTER Z WITH CARON
\x9f 159 9F LATIN CAPITAL LETTER Y WITH DIAERESIS
\xa0 160 A0 NON-BREAKING SPACE
\xa1 161 A1 INVERTED EXCLAMATION MARK
\xa2 162 A2 CENT SIGN
\xa3 163 A3 POUND SIGN
\xa4 164 A4 CURRENCY SIGN
\xa5 165 A5 YEN SIGN
\xa6 166 A6 PIPE, BROKEN VERTICAL BAR
\xa7 167 A7 SECTION SIGN
\xa8 168 A8 SPACING DIAERESIS - UMLAUT
\xa9 169 A9 COPYRIGHT SIGN
\xaa 170 AA FEMININE ORDINAL INDICATOR
\xab 171 AB LEFT DOUBLE ANGLE QUOTES
\xac 172 AC NOT SIGN
\xad 173 AD SOFT HYPHEN
\xae 174 AE REGISTERED TRADE MARK SIGN
\xaf 175 AF SPACING MACRON - OVERLINE
\xb0 176 B0 DEGREE SIGN
\xb1 177 B1 PLUS-OR-MINUS SIGN
\xb2 178 B2 SUPERSCRIPT TWO - SQUARED
\xb3 179 B3 SUPERSCRIPT THREE - CUBED
\xb4 180 B4 ACUTE ACCENT - SPACING ACUTE
\xb5 181 B5 MICRO SIGN
\xb6 182 B6 PILCROW SIGN - PARAGRAPH SIGN
\xb7 183 B7 MIDDLE DOT - GEORGIAN COMMA
\xb8 184 B8 SPACING CEDILLA
\xb9 185 B9 SUPERSCRIPT ONE
\xba 186 BA MASCULINE ORDINAL INDICATOR
\xbb 187 BB RIGHT DOUBLE ANGLE QUOTES
\xbc 188 BC FRACTION ONE QUARTER
\xbd 189 BD FRACTION ONE HALF
\xbe 190 BE FRACTION THREE QUARTERS
\xbf 191 BF INVERTED QUESTION MARK
\xc0 192 C0 LATIN CAPITAL LETTER A WITH GRAVE
\xc1 193 C1 LATIN CAPITAL LETTER A WITH ACUTE
\xc2 194 C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
\xc3 195 C3 LATIN CAPITAL LETTER A WITH TILDE
\xc4 196 C4 LATIN CAPITAL LETTER A WITH DIAERESIS
\xc5 197 C5 LATIN CAPITAL LETTER A WITH RING ABOVE
\xc6 198 C6 LATIN CAPITAL LETTER AE
\xc7 199 C7 LATIN CAPITAL LETTER C WITH CEDILLA
\xc8 200 C8 LATIN CAPITAL LETTER E WITH GRAVE
\xc9 201 C9 LATIN CAPITAL LETTER E WITH ACUTE
\xca 202 CA LATIN CAPITAL LETTER E WITH CIRCUMFLEX
\xcb 203 CB LATIN CAPITAL LETTER E WITH DIAERESIS
\xcc 204 CC LATIN CAPITAL LETTER I WITH GRAVE
\xcd 205 CD LATIN CAPITAL LETTER I WITH ACUTE
\xce 206 CE LATIN CAPITAL LETTER I WITH CIRCUMFLEX
\xcf 207 CF LATIN CAPITAL LETTER I WITH DIAERESIS
\xd0 208 D0 LATIN CAPITAL LETTER ETH
\xd1 209 D1 LATIN CAPITAL LETTER N WITH TILDE
\xd2 210 D2 LATIN CAPITAL LETTER O WITH GRAVE
\xd3 211 D3 LATIN CAPITAL LETTER O WITH ACUTE
\xd4 212 D4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX
\xd5 213 D5 LATIN CAPITAL LETTER O WITH TILDE
\xd6 214 D6 LATIN CAPITAL LETTER O WITH DIAERESIS
\xd7 215 D7 MULTIPLICATION SIGN
\xd8 216 D8 LATIN CAPITAL LETTER O WITH SLASH
\xd9 217 D9 LATIN CAPITAL LETTER U WITH GRAVE
\xda 218 DA LATIN CAPITAL LETTER U WITH ACUTE
\xdb 219 DB LATIN CAPITAL LETTER U WITH CIRCUMFLEX
\xdc 220 DC LATIN CAPITAL LETTER U WITH DIAERESIS
\xdd 221 DD LATIN CAPITAL LETTER Y WITH ACUTE
\xde 222 DE LATIN CAPITAL LETTER THORN
\xdf 223 DF LATIN SMALL LETTER SHARP S
\xe0 224 E0 LATIN SMALL LETTER A WITH GRAVE
\xe1 225 E1 LATIN SMALL LETTER A WITH ACUTE
\xe2 226 E2 LATIN SMALL LETTER A WITH CIRCUMFLEX
\xe3 227 E3 LATIN SMALL LETTER A WITH TILDE
\xe4 228 E4 LATIN SMALL LETTER A WITH DIAERESIS
\xe5 229 E5 LATIN SMALL LETTER A WITH RING ABOVE
\xe6 230 E6 LATIN SMALL LETTER AE
\xe7 231 E7 LATIN SMALL LETTER C WITH CEDILLA
\xe8 232 E8 LATIN SMALL LETTER E WITH GRAVE
\xe9 233 E9 LATIN SMALL LETTER E WITH ACUTE
\xea 234 EA LATIN SMALL LETTER E WITH CIRCUMFLEX
\xeb 235 EB LATIN SMALL LETTER E WITH DIAERESIS
\xec 236 EC LATIN SMALL LETTER I WITH GRAVE
\xed 237 ED LATIN SMALL LETTER I WITH ACUTE
\xee 238 EE LATIN SMALL LETTER I WITH CIRCUMFLEX
\xef 239 EF LATIN SMALL LETTER I WITH DIAERESIS
\xf0 240 F0 LATIN SMALL LETTER ETH
\xf1 241 F1 LATIN SMALL LETTER N WITH TILDE
\xf2 242 F2 LATIN SMALL LETTER O WITH GRAVE
\xf3 243 F3 LATIN SMALL LETTER O WITH ACUTE
\xf4 244 F4 LATIN SMALL LETTER O WITH CIRCUMFLEX
\xf5 245 F5 LATIN SMALL LETTER O WITH TILDE
\xf6 246 F6 LATIN SMALL LETTER O WITH DIAERESIS
\xf7 247 F7 DIVISION SIGN
\xf8 248 F8 LATIN SMALL LETTER O WITH SLASH
\xf9 249 F9 LATIN SMALL LETTER U WITH GRAVE
\xfa 250 FA LATIN SMALL LETTER U WITH ACUTE
\xfb 251 FB LATIN SMALL LETTER U WITH CIRCUMFLEX
\xfc 252 FC LATIN SMALL LETTER U WITH DIAERESIS
\xfd 253 FD LATIN SMALL LETTER Y WITH ACUTE
\xfe 254 FE LATIN SMALL LETTER THORN
\xff 255 FF LATIN SMALL LETTER Y WITH DIAERESIS
Forbidden:
\x00 0 00 NULL CHAR
\x01 1 01 START OF HEADING
\x02 2 02 START OF TEXT
\x03 3 03 END OF TEXT
\x04 4 04 END OF TRANSMISSION
\x05 5 05 ENQUIRY
\x06 6 06 ACKNOWLEDGEMENT
\x07 7 07 BELL
\x08 8 08 BACK SPACE
\t 9 09 HORIZONTAL TAB
\n 10 0A LINE FEED
\x0b 11 0B VERTICAL TAB
\x0c 12 0C FORM FEED
\r 13 0D CARRIAGE RETURN
\x0e 14 0E SHIFT OUT / X-ON
\x0f 15 0F SHIFT IN / X-OFF
\x10 16 10 DATA LINE ESCAPE
\x11 17 11 DEVICE CONTROL 1 (OFT. XON)
\x12 18 12 DEVICE CONTROL 2
\x13 19 13 DEVICE CONTROL 3 (OFT. XOFF)
\x14 20 14 DEVICE CONTROL 4
\x15 21 15 NEGATIVE ACKNOWLEDGEMENT
\x16 22 16 SYNCHRONOUS IDLE
\x17 23 17 END OF TRANSMIT BLOCK
\x18 24 18 CANCEL
\x19 25 19 END OF MEDIUM
\x1a 26 1A SUBSTITUTE
\x1b 27 1B ESCAPE
\x1c 28 1C FILE SEPARATOR
\x1d 29 1D GROUP SEPARATOR
\x1e 30 1E RECORD SEPARATOR
\x1f 31 1F UNIT SEPARATOR
" 34 22 QUOTATION MARK
* 42 2A ASTERISK
: 58 3A COLON
< 60 3C LESS-THAN SIGN
> 62 3E GREATER-THAN SIGN
? 63 3F QUESTION MARK
| 124 7C VERTICAL LINE
Screenshot of how Explorer sorts the files for me:
The highlighted file with the ? white smiley face was added manually by me (Alt+1) to show where this Unicode character (U+263A) ends up, see Jimbugs' answer.
The first file has a space as name (0x20), the second is the non-breaking space (0xa0). The files in the bottom half of the third row which look like they have no name use the characters with hex codes 0x81, 0x8D, 0x8F, 0x90, 0x9D (in this order from top to bottom).
From my testing, there are three criteria for sorting characters as described below. Aside from this, shorter strings are sorted above longer strings that start with the same characters.
Note: This testing only looked at the first character sorting and did not look into edge cases described by this answer, which found that, for all characters after the first character, numbers take precedence over symbols (i.e. the order is 1. Symbols 2. Numbers 3. Letters for first character, 1. Numbers 2. Symbols 3. Letters after). This answer also indicated that the Unicode/ASCII layer of sorting might not be entirely consistent. I'll update this answer if I get time to look into these edge cases.
Note: It's important to note that sorting order might be subject to change as described by this answer. It is not clear to me though the extent to which this actually ever changes. I've done this testing and found it to be valid on both Windows 7 and Windows 10.
Symbols
Latin (ordered by Unicode value (U+xxxx))
Greek (ordered by Unicode value (U+xxxx))
Cyrillic (ordered by Unicode value (U+xxxx))
Hebrew (ordered by Unicode value (U+xxxx))
Arabic (ordered by Unicode value (U+xxxx))
Numbers
Latin (ordered by Unicode value (U+xxxx))
Greek (ordered by Unicode value (U+xxxx))
Cyrillic (ordered by Unicode value (U+xxxx))
Hebrew (ordered by Unicode value (U+xxxx))
Arabic (ordered by Unicode value (U+xxxx))
Letters
Latin (ordered by Unicode value (U+xxxx))
Greek (ordered by Unicode value (U+xxxx))
Cyrillic (ordered by Unicode value (U+xxxx))
Hebrew (ordered by Unicode value (U+xxxx))
Arabic (ordered by Unicode value (U+xxxx))
Sorting Rule Sequence vs Observed Order
It's worth noting that there are really two ways of looking at this. Ultimately, what you have are sorting rules that are applied in a certain order, in turn, this produces an observed order. The ordering of older rules becomes nested under the ordering of newer rules. This means that the first rule applied is the last rule observed, while the last rule applied is the first or topmost rule observed.
Sorting Rule Sequence
1.) Sort on Unicode Value (U+xxxx)
2.) Sort on culture/language
3.) Sort on Type (Symbol, Number, Letter)
Observed Order
The highest level of grouping is by type in the following order...
1.) Symbols
2.) Numbers
3.) Letters
Therefore, any symbol from any language comes before any number from any language, while any letter from any language appears after all symbols and numbers.
The second level of grouping is by culture/language. The following order seems to apply for this:
Latin
Greek
Cyrillic
Hebrew
Arabic
The lowest rule observed is Unicode order, so items within a type-language group are ordered by Unicode value (U+xxxx).
Adapted from here: https://superuser.com/a/971721/496260
Although the answer with "!" was chosen as the correct answer, this is NOT true. I´m not sure about Windows Explorer in Windows 10/8.1/8, but I am certain concerning Windows 7 and XP.
The character that moves your filename to the very top is "'" (alt+39), and second best is "-" (the minus sign).
But sorting order in Windows Explorer is more complicated - it depends on the filename-lenght also, and numbers are treated very special.
The second character (and the following) is treated different. Here you often go best with a " " (spacebar), followed by the characters mentioned above, but you´ll have to try out, because it´s not so easy to find out the exact algorithm:
Here is an example for the correct sorting order for your understanding:
Another example:
Same filelenght:
Finally the very special logic when it comes to numbers:
But nevertheless: Renaming folders or files this way, you can quickly find what you are searching for.
I know it's an old question, but it's easy to check this out. Just create a folder with a bunch of dummy files whose names are each character on the keyboard. Of course, you can't really use \ | / : * ? " < > and leading and trailing blanks are a terrible idea.
If you do this, and it looks like no one did, you find that the Windows sort order for the FIRST character is 1. Special characters 2. Numbers 3. Letters
But for subsequent characters, it seems to be 1. Numbers 2. Special characters 3. Letters
Numbers are kind of weird, thanks to the "Improvements" made after the Y2K non-event. Special characters you would think would sort in ASCII order, but there are exceptions, notably the first two, apostrophe and dash, and the last two, plus and equals. Also, I have heard but not actually seen something about dashes being ignored. That is, in fact, NOT my experience.
So, ShxFee, I assume you meant the sort should be ascending, not descending, and the top-most (first) character in the sort order for the first character of the name is the apostrophe.
As NigelTouch said, special characters do not sort to ASCII, but my notes above specify exactly what does and does not sort in normal ASCII order. But he is certainly wrong about special characters always sorting first. As I noted above, that only appears to be true for the first character of the name.
I had the same problem. I wanted to 'bury' a folder at the bottom of the sort instead of bringing it to the top with the '!' character. Windows recognizes most special characters as just that, 'special', and therefore they ALL are sorted at the top.
However, if you think outside of the English characters, you will find a lot of luck. I used Character Map and the arial font, scrolled down past '~' and the others to the greek alphabet. Capitol Xi, ?, worked best for me, but I didn't check to see which was the actual 'lowest' in the sort.
Source: Stackoverflow.com