How does Windows with NTFS perform with large volumes of files and directories?
Is there any guidance around limits of files or directories you can place in a single directory before you run into performance problems or other issues?
E.g. is having a folder with 100,000 folders inside of it an OK thing to do?
This question is related to
windows
performance
filesystems
ntfs
For local access, large numbers of directories/files doesn't seem to be an issue. However, if you're accessing it across a network, there's a noticeable performance hit after a few hundred (especially when accessed from Vista machines (XP to Windows Server w/NTFS seemed to run much faster in that regard)).
I had real experience with about 100 000 files (each several MBs) on NTFS in a directory while copying one online library.
It takes about 15 minutes to open the directory with Explorer or 7-zip.
Writing site copy with winhttrack
will always get stuck after some time. It dealt also with directory, containing about 1 000 000 files. I think the worst thing is that the MFT can only by traversed sequentially.
Opening the same under ext2fsd on ext3 gave almost the same timing. Probably moving to reiserfs (not reiser4fs) can help.
Trying to avoid this situation is probably the best.
For your own programs using blobs w/o any fs could be beneficial. That's the way Facebook does for storing photos.
100,000 should be fine.
I have (anecdotally) seen people having problems with many millions of files and I have had problems myself with Explorer just not having a clue how to count past 60-something thousand files, but NTFS should be good for the volumes you're talking.
In case you're wondering, the technical (and I hope theoretical) maximum number of files is: 4,294,967,295
For local access, large numbers of directories/files doesn't seem to be an issue. However, if you're accessing it across a network, there's a noticeable performance hit after a few hundred (especially when accessed from Vista machines (XP to Windows Server w/NTFS seemed to run much faster in that regard)).
When you create a folder with N entries, you create a list of N items at file-system level. This list is a system-wide shared data structure. If you then start modifying this list continuously by adding/removing entries, I expect at least some lock contention over shared data. This contention - theoretically - can negatively affect performance.
For read-only scenarios I can't imagine any reason for performance degradation of directories with large number of entries.
I am building a File-Structure to host up to 2 billion (2^32) files and performed the following tests that show a sharp drop in Navigate + Read Performance at about 250 Files or 120 Directories per NTFS Directory on a Solid State Drive (SSD):
Interestingly the Number of Directories and Files do NOT significantly interfere.
So the Lessons are:
This is the Data (2 Measurements for each File and Directory):
(FOPS = File Operations per Second)
(DOPS = Directory Operations per Second)
#Files lg(#) FOPS FOPS2 DOPS DOPS2
10 1.00 16692 16692 16421 16312
100 2.00 16425 15943 15738 16031
120 2.08 15716 16024 15878 16122
130 2.11 15883 16124 14328 14347
160 2.20 15978 16184 11325 11128
200 2.30 16364 16052 9866 9678
210 2.32 16143 15977 9348 9547
220 2.34 16290 15909 9094 9038
230 2.36 16048 15930 9010 9094
240 2.38 15096 15725 8654 9143
250 2.40 15453 15548 8872 8472
260 2.41 14454 15053 8577 8720
300 2.48 12565 13245 8368 8361
400 2.60 11159 11462 7671 7574
500 2.70 10536 10560 7149 7331
1000 3.00 9092 9509 6569 6693
2000 3.30 8797 8810 6375 6292
10000 4.00 8084 8228 6210 6194
20000 4.30 8049 8343 5536 6100
50000 4.70 7468 7607 5364 5365
And this is the Test Code:
[TestCase(50000, false, Result = 50000)]
[TestCase(50000, true, Result = 50000)]
public static int TestDirPerformance(int numFilesInDir, bool testDirs) {
var files = new List<string>();
var dir = Path.GetTempPath() + "\\Sub\\" + Guid.NewGuid() + "\\";
Directory.CreateDirectory(dir);
Console.WriteLine("prepare...");
const string FILE_NAME = "\\file.txt";
for (int i = 0; i < numFilesInDir; i++) {
string filename = dir + Guid.NewGuid();
if (testDirs) {
var dirName = filename + "D";
Directory.CreateDirectory(dirName);
using (File.Create(dirName + FILE_NAME)) { }
} else {
using (File.Create(filename)) { }
}
files.Add(filename);
}
//Adding 1000 Directories didn't change File Performance
/*for (int i = 0; i < 1000; i++) {
string filename = dir + Guid.NewGuid();
Directory.CreateDirectory(filename + "D");
}*/
Console.WriteLine("measure...");
var r = new Random();
var sw = new Stopwatch();
sw.Start();
int len = 0;
int count = 0;
while (sw.ElapsedMilliseconds < 5000) {
string filename = files[r.Next(files.Count)];
string text = File.ReadAllText(testDirs ? filename + "D" + FILE_NAME : filename);
len += text.Length;
count++;
}
Console.WriteLine("{0} File Ops/sec ", count / 5);
return numFilesInDir;
}
When you create a folder with N entries, you create a list of N items at file-system level. This list is a system-wide shared data structure. If you then start modifying this list continuously by adding/removing entries, I expect at least some lock contention over shared data. This contention - theoretically - can negatively affect performance.
For read-only scenarios I can't imagine any reason for performance degradation of directories with large number of entries.
For local access, large numbers of directories/files doesn't seem to be an issue. However, if you're accessing it across a network, there's a noticeable performance hit after a few hundred (especially when accessed from Vista machines (XP to Windows Server w/NTFS seemed to run much faster in that regard)).
There are also performance problems with short file name creation slowing things down. Microsoft recommends turning off short filename creation if you have more than 300k files in a folder [1]. The less unique the first 6 characters are, the more of a problem this is.
[1] How NTFS Works from http://technet.microsoft.com, search for "300,000"
100,000 should be fine.
I have (anecdotally) seen people having problems with many millions of files and I have had problems myself with Explorer just not having a clue how to count past 60-something thousand files, but NTFS should be good for the volumes you're talking.
In case you're wondering, the technical (and I hope theoretical) maximum number of files is: 4,294,967,295
I had real experience with about 100 000 files (each several MBs) on NTFS in a directory while copying one online library.
It takes about 15 minutes to open the directory with Explorer or 7-zip.
Writing site copy with winhttrack
will always get stuck after some time. It dealt also with directory, containing about 1 000 000 files. I think the worst thing is that the MFT can only by traversed sequentially.
Opening the same under ext2fsd on ext3 gave almost the same timing. Probably moving to reiserfs (not reiser4fs) can help.
Trying to avoid this situation is probably the best.
For your own programs using blobs w/o any fs could be beneficial. That's the way Facebook does for storing photos.
When you create a folder with N entries, you create a list of N items at file-system level. This list is a system-wide shared data structure. If you then start modifying this list continuously by adding/removing entries, I expect at least some lock contention over shared data. This contention - theoretically - can negatively affect performance.
For read-only scenarios I can't imagine any reason for performance degradation of directories with large number of entries.
For local access, large numbers of directories/files doesn't seem to be an issue. However, if you're accessing it across a network, there's a noticeable performance hit after a few hundred (especially when accessed from Vista machines (XP to Windows Server w/NTFS seemed to run much faster in that regard)).
100,000 should be fine.
I have (anecdotally) seen people having problems with many millions of files and I have had problems myself with Explorer just not having a clue how to count past 60-something thousand files, but NTFS should be good for the volumes you're talking.
In case you're wondering, the technical (and I hope theoretical) maximum number of files is: 4,294,967,295
Source: Stackoverflow.com