How many files can I put in a directory

Question

Does it matter how many files I keep in a single directory? If so, how many files in a directory is too many, and what are the impacts of having too many files? (This is on a Linux server.)

Background: I have a photo album website, and every image uploaded is renamed to an 8-hex-digit id (say, a58f375c.jpg). This is to avoid filename conflicts (if lots of "IMG0001.JPG" files are uploaded, for example). The original filename and any useful metadata is stored in a database. Right now, I have somewhere around 1500 files in the images directory. This makes listing the files in the directory (through FTP or SSH client) take a few seconds. But I can't see that it has any effect other than that. In particular, there doesn't seem to be any impact on how quickly an image file is served to the user.

I've thought about reducing the number of images by making 16 subdirectories: 0-9 and a-f. Then I'd move the images into the subdirectories based on what the first hex digit of the filename was. But I'm not sure that there's any reason to do so except for the occasional listing of the directory through FTP/SSH.

User · Answer

It really depends on the filesystem used, and also some flags.

For example, ext3 can have many thousands of files; but after a couple of thousands, it used to be very slow. Mostly when listing a directory, but also when opening a single file. A few years ago, it gained the 'htree' option, that dramatically shortened the time needed to get an inode given a filename.

Personally, I use subdirectories to keep most levels under a thousand or so items. In your case, I'd create 256 directories, with the two last hex digits of the ID. Use the last and not the first digits, so you get the load balanced.

User · Answer

For what it s worth  I just created a directory on an ext4 file system with 1 000 000 files in it  then randomly accessed those files through a web server  I didn t notice any premium on accessing those over  say  only having 10 files there   This is radically different from my experience doing this on ntfs a few years back

User · Answer

I have had over 8 million files in a single ext3 directory. libc readdir() which is used by find, ls and most of the other methods discussed in this thread to list large directories.

The reason ls and find are slow in this case is that readdir() only reads 32K of directory entries at a time, so on slow disks it will require many many reads to list a directory. There is a solution to this speed problem. I wrote a pretty detailed article about it at: http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/

The key take away is: use getdents() directly -- http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html rather than anything that's based on libc readdir() so you can specify the buffer size when reading directory entries from disk.

User · Answer

I ran into a similar issue  I was trying to access a directory with over 10 000 files in it  It was taking too long to build the file list and run any type of commands on any of the files   I thought up a little php script to do this for myself and tried to figure a way to prevent it from time out in the browser   The following is the php script I wrote to resolve the issue   Listing Files in a Directory with too many files for FTP  How it helps someone

User · Answer

It absolutely depends on the filesystem  Many modern filesystems use decent data structures to store the contents of directories  but older filesystems often just added the entries to a list  so retrieving a file was an O n  operation   Even if the filesystem does it right  it s still absolutely possible for programs that list directory contents to mess up and do an O n 2  sort  so to be on the safe side  I d always limit the number of files per directory to no more than 500

User · Answer

I m working on a similar problem right now  We have a hierarchichal directory structure and use image ids as filenames  For example  an image with id 1234567 is placed in       45 67 1234567  lt     gt  jpg   using last 4 digits to determine where the file goes   With a few thousand images  you could use a one-level hierarchy  Our sysadmin suggested no more than couple of thousand files in any given directory  ext3  for efficiency   backup   whatever other reasons he had in mind

User · Answer

Depends on filesystem  Some users mentioned that the performance impact depends on the used filesystem  Of course  Filesystems like EXT3 can be very slow  But even if you use EXT4 or XFS you can not prevent that listing a folder through ls or find or through an external connection like FTP will become slower an slower   Solution I prefer the same way as  armandino  For that I use this little function in PHP to convert IDs into a filepath that results 1000 files per directory   function dynamic path  int           1000   1000 files per dir        10000   10000 files per dir        2   100 dirs per dir        3   1000 dirs per dir     return implode      str split intval  int   1000   2              or you could use the second version if you want to use alpha-numeric characters   function dynamic path2  str           26 alpha   10 num   3 special chars    -    39 combinations        -1   39 2   1521 files per dir        -2   39 3   59319 files per dir  if every combination exists       left   substr  str  0  -1       return implode      str split  left    left    str 0   2              results    lt  php  files   explode       1 jpg 12 jpg 123 jpg 999 jpg 1000 jpg 1234 jpg 1999 jpg 2000 jpg 12345 jpg 123456 jpg 1234567 jpg 12345678 jpg 123456789 jpg    foreach   files as  file        echo dynamic path basename  file    jpg       file   PHP EOL      gt   1 1 jpg 1 12 jpg 1 123 jpg 1 999 jpg 1 1000 jpg 2 1234 jpg 2 1999 jpg 2 2000 jpg 13 12345 jpg 12 4 123456 jpg 12 35 1234567 jpg 12 34 6 12345678 jpg 12 34 57 123456789 jpg   lt  php  files   array merge  files  explode       a jpg b jpg ab jpg abc jpg ddd jpg af ff jpg abcd jpg akkk jpg bf ff jpg abc-de jpg abcdef jpg abcdefg jpg abcdefgh jpg abcdefghi jpg     foreach   files as  file        echo dynamic path2 basename  file    jpg       file   PHP EOL      gt   1 1 jpg 1 12 jpg 12 123 jpg 99 999 jpg 10 0 1000 jpg 12 3 1234 jpg 19 9 1999 jpg 20 0 2000 jpg 12 34 12345 jpg 12 34 5 123456 jpg 12 34 56 1234567 jpg 12 34 56 7 12345678 jpg 12 34 56 78 123456789 jpg a a jpg b b jpg a ab jpg ab abc jpg dd ddd jpg af  f af ff jpg ab c abcd jpg ak k akkk jpg bf  f bf ff jpg ab c- d abc-de jpg ab cd e abcdef jpg ab cd ef abcdefg jpg ab cd ef g abcdefgh jpg ab cd ef gh abcdefghi jpg   As you can see for the  int-version every folder contains up to 1000 files and up to 99 directories containing 1000 files and 99 directories      But do not forget that to many directories cause the same performance problems   Finally you should think about how to reduce the amount of files in total  Depending on your target you can use CSS sprites to combine multiple tiny images like avatars  icons  smilies  etc  or if you use many small non-media files consider combining them e g  in JSON format  In my case I had thousands of mini-caches and finally I decided to combine them in packs of 10

User · Answer

The question comes down to what you re going to do with the files   Under Windows  any directory with more than 2k files tends to open slowly for me in Explorer  If they re all image files  more than 1k tend to open very slowly in thumbnail view   At one time  the system-imposed limit was 32 767  It s higher now  but even that is way too many files to handle at one time under most circumstances

User · Answer

It depends a bit on the specific filesystem in use on the Linux server. Nowadays the default is ext3 with dir_index, which makes searching large directories very fast.

So speed shouldn't be an issue, other than the one you already noted, which is that listings will take longer.

There is a limit to the total number of files in one directory. I seem to remember it definitely working up to 32000 files.

User · Answer

FAT32    Maximum number of files  268 173 300 Maximum number of files per directory  216 nbsp - nbsp 1  65 535  Maximum file size  2 nbsp GiB nbsp - nbsp 1 without LFS  4 nbsp GiB nbsp - nbsp 1 with     NTFS    Maximum number of files  232 nbsp - nbsp 1  4 294 967 295   Maximum file size   Implementation  244 nbsp - nbsp 26 bytes  16 nbsp TiB nbsp - nbsp 64 nbsp KiB  Theoretical  264 nbsp - nbsp 26 bytes  16 nbsp EiB nbsp - nbsp 64 nbsp KiB   Maximum volume size   Implementation  232 nbsp - nbsp 1 nbsp clusters  256 nbsp TiB nbsp - nbsp 64 nbsp KiB  Theoretical  264 nbsp - nbsp 1 nbsp clusters nbsp  1 nbsp YiB nbsp - nbsp 64 KiB     ext2    Maximum number of files  1018 Maximum number of files per directory   1 3 nbsp  times  nbsp 1020  performance issues past 10 000  Maximum file size   16 nbsp GiB  block size of 1 nbsp KiB  256 nbsp GiB  block size of 2 nbsp KiB  2 nbsp TiB  block size of 4 nbsp KiB  2 nbsp TiB  block size of 8 nbsp KiB   Maximum volume size   4 nbsp TiB  block size of 1 nbsp KiB  8 nbsp TiB  block size of 2 nbsp KiB  16 nbsp TiB  block size of 4 nbsp KiB  32 nbsp TiB  block size of 8 nbsp KiB     ext3    Maximum number of files  min volumeSize nbsp   nbsp 213  nbsp numberOfBlocks   Maximum file size  same as ext2 Maximum volume size  same as ext2   ext4    Maximum number of files  232 nbsp - nbsp 1  4 294 967 295  Maximum number of files per directory  unlimited Maximum file size  244 nbsp - nbsp 1 nbsp bytes  16 nbsp TiB nbsp - nbsp 1  Maximum volume size  248 nbsp - nbsp 1 nbsp bytes  256 nbsp TiB nbsp - nbsp 1

User · Answer

It really depends on the filesystem used, and also some flags.

For example, ext3 can have many thousands of files; but after a couple of thousands, it used to be very slow. Mostly when listing a directory, but also when opening a single file. A few years ago, it gained the 'htree' option, that dramatically shortened the time needed to get an inode given a filename.

Personally, I use subdirectories to keep most levels under a thousand or so items. In your case, I'd create 256 directories, with the two last hex digits of the ID. Use the last and not the first digits, so you get the load balanced.

User · Answer

There is no single figure that is  too many   as long as it doesn t exceed the limits of the OS  However  the more files in a directory  regardless of the OS  the longer it takes to access any individual file  and on most OS s  the performance is non-linear  so to find one file out of 10 000 takes more then 10 times longer then to find a file in 1 000    Secondary problems associated with having a lot of files in a directory include wild card expansion failures  To reduce the risks  you might consider ordering your directories by date of upload  or some other useful piece of metadata

User · Answer

Keep in mind that on Linux if you have a directory with too many files, the shell may not be able to expand wildcards. I have this issue with a photo album hosted on Linux. It stores all the resized images in a single directory. While the file system can handle many files, the shell can't. Example:

-shell-3.00$ ls A*
-shell: /bin/ls: Argument list too long

or

-shell-3.00$ chmod 644 *jpg
-shell: /bin/chmod: Argument list too long

User · Answer

If the time involved in implementing a directory partitioning scheme is minimal, I am in favor of it. The first time you have to debug a problem that involves manipulating a 10000-file directory via the console you will understand.

As an example, F-Spot stores photo files as YYYY\MM\DD\filename.ext, which means the largest directory I have had to deal with while manually manipulating my ~20000-photo collection is about 800 files. This also makes the files more easily browsable from a third party application. Never assume that your software is the only thing that will be accessing your software's files.

User · Answer

I recall running a program that was creating a huge amount of files at the output. The files were sorted at 30000 per directory. I do not recall having any read problems when I had to reuse the produced output. It was on an 32-bit Ubuntu Linux laptop, and even Nautilus displayed the directory contents, albeit after a few seconds.

ext3 filesystem: Similar code on a 64-bit system dealt well with 64000 files per directory.

User · Answer

I have had over 8 million files in a single ext3 directory. libc readdir() which is used by find, ls and most of the other methods discussed in this thread to list large directories.

The reason ls and find are slow in this case is that readdir() only reads 32K of directory entries at a time, so on slow disks it will require many many reads to list a directory. There is a solution to this speed problem. I wrote a pretty detailed article about it at: http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/

The key take away is: use getdents() directly -- http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html rather than anything that's based on libc readdir() so you can specify the buffer size when reading directory entries from disk.

User · Answer

FAT32    Maximum number of files  268 173 300 Maximum number of files per directory  216 nbsp - nbsp 1  65 535  Maximum file size  2 nbsp GiB nbsp - nbsp 1 without LFS  4 nbsp GiB nbsp - nbsp 1 with     NTFS    Maximum number of files  232 nbsp - nbsp 1  4 294 967 295   Maximum file size   Implementation  244 nbsp - nbsp 26 bytes  16 nbsp TiB nbsp - nbsp 64 nbsp KiB  Theoretical  264 nbsp - nbsp 26 bytes  16 nbsp EiB nbsp - nbsp 64 nbsp KiB   Maximum volume size   Implementation  232 nbsp - nbsp 1 nbsp clusters  256 nbsp TiB nbsp - nbsp 64 nbsp KiB  Theoretical  264 nbsp - nbsp 1 nbsp clusters nbsp  1 nbsp YiB nbsp - nbsp 64 KiB     ext2    Maximum number of files  1018 Maximum number of files per directory   1 3 nbsp  times  nbsp 1020  performance issues past 10 000  Maximum file size   16 nbsp GiB  block size of 1 nbsp KiB  256 nbsp GiB  block size of 2 nbsp KiB  2 nbsp TiB  block size of 4 nbsp KiB  2 nbsp TiB  block size of 8 nbsp KiB   Maximum volume size   4 nbsp TiB  block size of 1 nbsp KiB  8 nbsp TiB  block size of 2 nbsp KiB  16 nbsp TiB  block size of 4 nbsp KiB  32 nbsp TiB  block size of 8 nbsp KiB     ext3    Maximum number of files  min volumeSize nbsp   nbsp 213  nbsp numberOfBlocks   Maximum file size  same as ext2 Maximum volume size  same as ext2   ext4    Maximum number of files  232 nbsp - nbsp 1  4 294 967 295  Maximum number of files per directory  unlimited Maximum file size  244 nbsp - nbsp 1 nbsp bytes  16 nbsp TiB nbsp - nbsp 1  Maximum volume size  248 nbsp - nbsp 1 nbsp bytes  256 nbsp TiB nbsp - nbsp 1

User · Answer

The biggest issue I ve run into is on a 32-bit system  Once you pass a certain number  tools like  ls  stop working   Trying to do anything with that directory once you pass that barrier becomes a huge problem

User · Answer

It depends a bit on the specific filesystem in use on the Linux server. Nowadays the default is ext3 with dir_index, which makes searching large directories very fast.

So speed shouldn't be an issue, other than the one you already noted, which is that listings will take longer.

There is a limit to the total number of files in one directory. I seem to remember it definitely working up to 32000 files.

User · Answer

What most of the answers above fail to show is that there is no "One Size Fits All" answer to the original question.

In today's environment we have a large conglomerate of different hardware and software -- some is 32 bit, some is 64 bit, some is cutting edge and some is tried and true - reliable and never changing. Added to that is a variety of older and newer hardware, older and newer OSes, different vendors (Windows, Unixes, Apple, etc.) and a myriad of utilities and servers that go along. As hardware has improved and software is converted to 64 bit compatibility, there has necessarily been considerable delay in getting all the pieces of this very large and complex world to play nicely with the rapid pace of changes.

IMHO there is no one way to fix a problem. The solution is to research the possibilities and then by trial and error find what works best for your particular needs. Each user must determine what works for their system rather than using a cookie cutter approach.

I for example have a media server with a few very large files. The result is only about 400 files filling a 3 TB drive. Only 1% of the inodes are used but 95% of the total space is used. Someone else, with a lot of smaller files may run out of inodes before they come near to filling the space. (On ext4 filesystems as a rule of thumb, 1 inode is used for each file/directory.) While theoretically the total number of files that may be contained within a directory is nearly infinite, practicality determines that the overall usage determine realistic units, not just filesystem capabilities.

I hope that all the different answers above have promoted thought and problem solving rather than presenting an insurmountable barrier to progress.

User · Answer

I recall running a program that was creating a huge amount of files at the output. The files were sorted at 30000 per directory. I do not recall having any read problems when I had to reuse the produced output. It was on an 32-bit Ubuntu Linux laptop, and even Nautilus displayed the directory contents, albeit after a few seconds.

ext3 filesystem: Similar code on a 64-bit system dealt well with 64000 files per directory.

User · Answer

It depends a bit on the specific filesystem in use on the Linux server. Nowadays the default is ext3 with dir_index, which makes searching large directories very fast.

So speed shouldn't be an issue, other than the one you already noted, which is that listings will take longer.

There is a limit to the total number of files in one directory. I seem to remember it definitely working up to 32000 files.

User · Answer

It really depends on the filesystem used, and also some flags.

For example, ext3 can have many thousands of files; but after a couple of thousands, it used to be very slow. Mostly when listing a directory, but also when opening a single file. A few years ago, it gained the 'htree' option, that dramatically shortened the time needed to get an inode given a filename.

Personally, I use subdirectories to keep most levels under a thousand or so items. In your case, I'd create 256 directories, with the two last hex digits of the ID. Use the last and not the first digits, so you get the load balanced.

User · Answer

I ve been having the same issue  Trying to store millions of files in a Ubuntu server in ext4  Ended running my own benchmarks  Found out that flat directory performs way better while being way simpler to use     Wrote an article

User · Answer

For what it s worth  I just created a directory on an ext4 file system with 1 000 000 files in it  then randomly accessed those files through a web server  I didn t notice any premium on accessing those over  say  only having 10 files there   This is radically different from my experience doing this on ntfs a few years back

User · Answer

The biggest issue I ve run into is on a 32-bit system  Once you pass a certain number  tools like  ls  stop working   Trying to do anything with that directory once you pass that barrier becomes a huge problem

User · Answer

I respect this doesn t totally answer your question as to how many is too many  but an idea for solving the long term problem is that in addition to storing the original file metadata  also store which folder on disk it is stored in - normalize out that piece of metadata  Once a folder grows beyond some limit you are comfortable with for performance  aesthetic or whatever reason  you just create a second folder and start dropping files there

User · Answer

FAT32    Maximum number of files  268 173 300 Maximum number of files per directory  216 nbsp - nbsp 1  65 535  Maximum file size  2 nbsp GiB nbsp - nbsp 1 without LFS  4 nbsp GiB nbsp - nbsp 1 with     NTFS    Maximum number of files  232 nbsp - nbsp 1  4 294 967 295   Maximum file size   Implementation  244 nbsp - nbsp 26 bytes  16 nbsp TiB nbsp - nbsp 64 nbsp KiB  Theoretical  264 nbsp - nbsp 26 bytes  16 nbsp EiB nbsp - nbsp 64 nbsp KiB   Maximum volume size   Implementation  232 nbsp - nbsp 1 nbsp clusters  256 nbsp TiB nbsp - nbsp 64 nbsp KiB  Theoretical  264 nbsp - nbsp 1 nbsp clusters nbsp  1 nbsp YiB nbsp - nbsp 64 KiB     ext2    Maximum number of files  1018 Maximum number of files per directory   1 3 nbsp  times  nbsp 1020  performance issues past 10 000  Maximum file size   16 nbsp GiB  block size of 1 nbsp KiB  256 nbsp GiB  block size of 2 nbsp KiB  2 nbsp TiB  block size of 4 nbsp KiB  2 nbsp TiB  block size of 8 nbsp KiB   Maximum volume size   4 nbsp TiB  block size of 1 nbsp KiB  8 nbsp TiB  block size of 2 nbsp KiB  16 nbsp TiB  block size of 4 nbsp KiB  32 nbsp TiB  block size of 8 nbsp KiB     ext3    Maximum number of files  min volumeSize nbsp   nbsp 213  nbsp numberOfBlocks   Maximum file size  same as ext2 Maximum volume size  same as ext2   ext4    Maximum number of files  232 nbsp - nbsp 1  4 294 967 295  Maximum number of files per directory  unlimited Maximum file size  244 nbsp - nbsp 1 nbsp bytes  16 nbsp TiB nbsp - nbsp 1  Maximum volume size  248 nbsp - nbsp 1 nbsp bytes  256 nbsp TiB nbsp - nbsp 1

User · Answer

I respect this doesn t totally answer your question as to how many is too many  but an idea for solving the long term problem is that in addition to storing the original file metadata  also store which folder on disk it is stored in - normalize out that piece of metadata  Once a folder grows beyond some limit you are comfortable with for performance  aesthetic or whatever reason  you just create a second folder and start dropping files there

User · Answer

I recall running a program that was creating a huge amount of files at the output. The files were sorted at 30000 per directory. I do not recall having any read problems when I had to reuse the produced output. It was on an 32-bit Ubuntu Linux laptop, and even Nautilus displayed the directory contents, albeit after a few seconds.

ext3 filesystem: Similar code on a 64-bit system dealt well with 64000 files per directory.

User · Answer

There is no single figure that is  too many   as long as it doesn t exceed the limits of the OS  However  the more files in a directory  regardless of the OS  the longer it takes to access any individual file  and on most OS s  the performance is non-linear  so to find one file out of 10 000 takes more then 10 times longer then to find a file in 1 000    Secondary problems associated with having a lot of files in a directory include wild card expansion failures  To reduce the risks  you might consider ordering your directories by date of upload  or some other useful piece of metadata

User · Answer

Not an answer  but just some suggestions   Select a more suitable FS  file system   Since from a historic point of view  all your issues were wise enough  to be once central to FSs evolving over decades  I mean more modern FS better support your issues  First make a comparison decision table based on your ultimate purpose from FS list   I think its time to shift your paradigms  So I personally suggest using a distributed system aware FS  which means no limits at all regarding size  number of files and etc  Otherwise you will sooner or later challenged by new unanticipated problems   I m not sure to work  but if you don t mention some experimentation  give AUFS over your current file system a try  I guess it has facilities to mimic multiple folders as a single virtual folder   To overcome hardware limits you can use RAID-0

User · Answer

The biggest issue I ve run into is on a 32-bit system  Once you pass a certain number  tools like  ls  stop working   Trying to do anything with that directory once you pass that barrier becomes a huge problem

User · Answer

The question comes down to what you re going to do with the files   Under Windows  any directory with more than 2k files tends to open slowly for me in Explorer  If they re all image files  more than 1k tend to open very slowly in thumbnail view   At one time  the system-imposed limit was 32 767  It s higher now  but even that is way too many files to handle at one time under most circumstances

User · Answer

It depends a bit on the specific filesystem in use on the Linux server. Nowadays the default is ext3 with dir_index, which makes searching large directories very fast.

So speed shouldn't be an issue, other than the one you already noted, which is that listings will take longer.

There is a limit to the total number of files in one directory. I seem to remember it definitely working up to 32000 files.

User · Answer

I respect this doesn t totally answer your question as to how many is too many  but an idea for solving the long term problem is that in addition to storing the original file metadata  also store which folder on disk it is stored in - normalize out that piece of metadata  Once a folder grows beyond some limit you are comfortable with for performance  aesthetic or whatever reason  you just create a second folder and start dropping files there

User · Answer

I have a directory with 88 914 files in it  Like yourself this is used for storing thumbnails and on a Linux server   Listed files via FTP or a php function is slow yes  but there is also a performance hit on displaying the file  e g  www website com thumbdir gh3hg4h2b4h234b3h2 jpg has a wait time of 200-400 ms  As a comparison on another site I have with a around 100 files in a directory the image is displayed after just  40ms of waiting   I ve given this answer as most people have just written how directory search functions will perform  which you won t be using on a thumb folder - just statically displaying files  but will be interested in performance of how the files can actually be used

User · Answer

I m working on a similar problem right now  We have a hierarchichal directory structure and use image ids as filenames  For example  an image with id 1234567 is placed in       45 67 1234567  lt     gt  jpg   using last 4 digits to determine where the file goes   With a few thousand images  you could use a one-level hierarchy  Our sysadmin suggested no more than couple of thousand files in any given directory  ext3  for efficiency   backup   whatever other reasons he had in mind

User · Answer

I have a directory with 88 914 files in it  Like yourself this is used for storing thumbnails and on a Linux server   Listed files via FTP or a php function is slow yes  but there is also a performance hit on displaying the file  e g  www website com thumbdir gh3hg4h2b4h234b3h2 jpg has a wait time of 200-400 ms  As a comparison on another site I have with a around 100 files in a directory the image is displayed after just  40ms of waiting   I ve given this answer as most people have just written how directory search functions will perform  which you won t be using on a thumb folder - just statically displaying files  but will be interested in performance of how the files can actually be used

User · Answer

If the time involved in implementing a directory partitioning scheme is minimal, I am in favor of it. The first time you have to debug a problem that involves manipulating a 10000-file directory via the console you will understand.

As an example, F-Spot stores photo files as YYYY\MM\DD\filename.ext, which means the largest directory I have had to deal with while manually manipulating my ~20000-photo collection is about 800 files. This also makes the files more easily browsable from a third party application. Never assume that your software is the only thing that will be accessing your software's files.

User · Answer

What most of the answers above fail to show is that there is no "One Size Fits All" answer to the original question.

In today's environment we have a large conglomerate of different hardware and software -- some is 32 bit, some is 64 bit, some is cutting edge and some is tried and true - reliable and never changing. Added to that is a variety of older and newer hardware, older and newer OSes, different vendors (Windows, Unixes, Apple, etc.) and a myriad of utilities and servers that go along. As hardware has improved and software is converted to 64 bit compatibility, there has necessarily been considerable delay in getting all the pieces of this very large and complex world to play nicely with the rapid pace of changes.

IMHO there is no one way to fix a problem. The solution is to research the possibilities and then by trial and error find what works best for your particular needs. Each user must determine what works for their system rather than using a cookie cutter approach.

I for example have a media server with a few very large files. The result is only about 400 files filling a 3 TB drive. Only 1% of the inodes are used but 95% of the total space is used. Someone else, with a lot of smaller files may run out of inodes before they come near to filling the space. (On ext4 filesystems as a rule of thumb, 1 inode is used for each file/directory.) While theoretically the total number of files that may be contained within a directory is nearly infinite, practicality determines that the overall usage determine realistic units, not just filesystem capabilities.

I hope that all the different answers above have promoted thought and problem solving rather than presenting an insurmountable barrier to progress.

User · Answer

I m working on a similar problem right now  We have a hierarchichal directory structure and use image ids as filenames  For example  an image with id 1234567 is placed in       45 67 1234567  lt     gt  jpg   using last 4 digits to determine where the file goes   With a few thousand images  you could use a one-level hierarchy  Our sysadmin suggested no more than couple of thousand files in any given directory  ext3  for efficiency   backup   whatever other reasons he had in mind

User · Answer

ext3 does in fact have directory size limits, and they depend on the block size of the filesystem. There isn't a per-directory "max number" of files, but a per-directory "max number of blocks used to store file entries". Specifically, the size of the directory itself can't grow beyond a b-tree of height 3, and the fanout of the tree depends on the block size. See this link for some details.

https://www.mail-archive.com/[email protected]/msg01944.html

I was bitten by this recently on a filesystem formatted with 2K blocks, which was inexplicably getting directory-full kernel messages warning: ext3_dx_add_entry: Directory index full! when I was copying from another ext3 filesystem. In my case, a directory with a mere 480,000 files was unable to be copied to the destination.

User · Answer

The question comes down to what you re going to do with the files   Under Windows  any directory with more than 2k files tends to open slowly for me in Explorer  If they re all image files  more than 1k tend to open very slowly in thumbnail view   At one time  the system-imposed limit was 32 767  It s higher now  but even that is way too many files to handle at one time under most circumstances

User · Answer

It absolutely depends on the filesystem  Many modern filesystems use decent data structures to store the contents of directories  but older filesystems often just added the entries to a list  so retrieving a file was an O n  operation   Even if the filesystem does it right  it s still absolutely possible for programs that list directory contents to mess up and do an O n 2  sort  so to be on the safe side  I d always limit the number of files per directory to no more than 500

User · Answer

Keep in mind that on Linux if you have a directory with too many files, the shell may not be able to expand wildcards. I have this issue with a photo album hosted on Linux. It stores all the resized images in a single directory. While the file system can handle many files, the shell can't. Example:

-shell-3.00$ ls A*
-shell: /bin/ls: Argument list too long

or

-shell-3.00$ chmod 644 *jpg
-shell: /bin/chmod: Argument list too long

User · Answer

I ran into a similar issue  I was trying to access a directory with over 10 000 files in it  It was taking too long to build the file list and run any type of commands on any of the files   I thought up a little php script to do this for myself and tried to figure a way to prevent it from time out in the browser   The following is the php script I wrote to resolve the issue   Listing Files in a Directory with too many files for FTP  How it helps someone

User · Answer

It really depends on the filesystem used, and also some flags.

For example, ext3 can have many thousands of files; but after a couple of thousands, it used to be very slow. Mostly when listing a directory, but also when opening a single file. A few years ago, it gained the 'htree' option, that dramatically shortened the time needed to get an inode given a filename.

Personally, I use subdirectories to keep most levels under a thousand or so items. In your case, I'd create 256 directories, with the two last hex digits of the ID. Use the last and not the first digits, so you get the load balanced.

User · Answer

Not an answer  but just some suggestions   Select a more suitable FS  file system   Since from a historic point of view  all your issues were wise enough  to be once central to FSs evolving over decades  I mean more modern FS better support your issues  First make a comparison decision table based on your ultimate purpose from FS list   I think its time to shift your paradigms  So I personally suggest using a distributed system aware FS  which means no limits at all regarding size  number of files and etc  Otherwise you will sooner or later challenged by new unanticipated problems   I m not sure to work  but if you don t mention some experimentation  give AUFS over your current file system a try  I guess it has facilities to mimic multiple folders as a single virtual folder   To overcome hardware limits you can use RAID-0

User · Answer

If the time involved in implementing a directory partitioning scheme is minimal, I am in favor of it. The first time you have to debug a problem that involves manipulating a 10000-file directory via the console you will understand.

As an example, F-Spot stores photo files as YYYY\MM\DD\filename.ext, which means the largest directory I have had to deal with while manually manipulating my ~20000-photo collection is about 800 files. This also makes the files more easily browsable from a third party application. Never assume that your software is the only thing that will be accessing your software's files.

User · Answer

ext3 does in fact have directory size limits, and they depend on the block size of the filesystem. There isn't a per-directory "max number" of files, but a per-directory "max number of blocks used to store file entries". Specifically, the size of the directory itself can't grow beyond a b-tree of height 3, and the fanout of the tree depends on the block size. See this link for some details.

https://www.mail-archive.com/[email protected]/msg01944.html

I was bitten by this recently on a filesystem formatted with 2K blocks, which was inexplicably getting directory-full kernel messages warning: ext3_dx_add_entry: Directory index full! when I was copying from another ext3 filesystem. In my case, a directory with a mere 480,000 files was unable to be copied to the destination.

User · Answer

FAT32    Maximum number of files  268 173 300 Maximum number of files per directory  216 nbsp - nbsp 1  65 535  Maximum file size  2 nbsp GiB nbsp - nbsp 1 without LFS  4 nbsp GiB nbsp - nbsp 1 with     NTFS    Maximum number of files  232 nbsp - nbsp 1  4 294 967 295   Maximum file size   Implementation  244 nbsp - nbsp 26 bytes  16 nbsp TiB nbsp - nbsp 64 nbsp KiB  Theoretical  264 nbsp - nbsp 26 bytes  16 nbsp EiB nbsp - nbsp 64 nbsp KiB   Maximum volume size   Implementation  232 nbsp - nbsp 1 nbsp clusters  256 nbsp TiB nbsp - nbsp 64 nbsp KiB  Theoretical  264 nbsp - nbsp 1 nbsp clusters nbsp  1 nbsp YiB nbsp - nbsp 64 KiB     ext2    Maximum number of files  1018 Maximum number of files per directory   1 3 nbsp  times  nbsp 1020  performance issues past 10 000  Maximum file size   16 nbsp GiB  block size of 1 nbsp KiB  256 nbsp GiB  block size of 2 nbsp KiB  2 nbsp TiB  block size of 4 nbsp KiB  2 nbsp TiB  block size of 8 nbsp KiB   Maximum volume size   4 nbsp TiB  block size of 1 nbsp KiB  8 nbsp TiB  block size of 2 nbsp KiB  16 nbsp TiB  block size of 4 nbsp KiB  32 nbsp TiB  block size of 8 nbsp KiB     ext3    Maximum number of files  min volumeSize nbsp   nbsp 213  nbsp numberOfBlocks   Maximum file size  same as ext2 Maximum volume size  same as ext2   ext4    Maximum number of files  232 nbsp - nbsp 1  4 294 967 295  Maximum number of files per directory  unlimited Maximum file size  244 nbsp - nbsp 1 nbsp bytes  16 nbsp TiB nbsp - nbsp 1  Maximum volume size  248 nbsp - nbsp 1 nbsp bytes  256 nbsp TiB nbsp - nbsp 1

User · Answer

I recall running a program that was creating a huge amount of files at the output. The files were sorted at 30000 per directory. I do not recall having any read problems when I had to reuse the produced output. It was on an 32-bit Ubuntu Linux laptop, and even Nautilus displayed the directory contents, albeit after a few seconds.

ext3 filesystem: Similar code on a 64-bit system dealt well with 64000 files per directory.

User · Answer

The biggest issue I ve run into is on a 32-bit system  Once you pass a certain number  tools like  ls  stop working   Trying to do anything with that directory once you pass that barrier becomes a huge problem

User · Answer

The question comes down to what you re going to do with the files   Under Windows  any directory with more than 2k files tends to open slowly for me in Explorer  If they re all image files  more than 1k tend to open very slowly in thumbnail view   At one time  the system-imposed limit was 32 767  It s higher now  but even that is way too many files to handle at one time under most circumstances

User · Answer

Depends on filesystem  Some users mentioned that the performance impact depends on the used filesystem  Of course  Filesystems like EXT3 can be very slow  But even if you use EXT4 or XFS you can not prevent that listing a folder through ls or find or through an external connection like FTP will become slower an slower   Solution I prefer the same way as  armandino  For that I use this little function in PHP to convert IDs into a filepath that results 1000 files per directory   function dynamic path  int           1000   1000 files per dir        10000   10000 files per dir        2   100 dirs per dir        3   1000 dirs per dir     return implode      str split intval  int   1000   2              or you could use the second version if you want to use alpha-numeric characters   function dynamic path2  str           26 alpha   10 num   3 special chars    -    39 combinations        -1   39 2   1521 files per dir        -2   39 3   59319 files per dir  if every combination exists       left   substr  str  0  -1       return implode      str split  left    left    str 0   2              results    lt  php  files   explode       1 jpg 12 jpg 123 jpg 999 jpg 1000 jpg 1234 jpg 1999 jpg 2000 jpg 12345 jpg 123456 jpg 1234567 jpg 12345678 jpg 123456789 jpg    foreach   files as  file        echo dynamic path basename  file    jpg       file   PHP EOL      gt   1 1 jpg 1 12 jpg 1 123 jpg 1 999 jpg 1 1000 jpg 2 1234 jpg 2 1999 jpg 2 2000 jpg 13 12345 jpg 12 4 123456 jpg 12 35 1234567 jpg 12 34 6 12345678 jpg 12 34 57 123456789 jpg   lt  php  files   array merge  files  explode       a jpg b jpg ab jpg abc jpg ddd jpg af ff jpg abcd jpg akkk jpg bf ff jpg abc-de jpg abcdef jpg abcdefg jpg abcdefgh jpg abcdefghi jpg     foreach   files as  file        echo dynamic path2 basename  file    jpg       file   PHP EOL      gt   1 1 jpg 1 12 jpg 12 123 jpg 99 999 jpg 10 0 1000 jpg 12 3 1234 jpg 19 9 1999 jpg 20 0 2000 jpg 12 34 12345 jpg 12 34 5 123456 jpg 12 34 56 1234567 jpg 12 34 56 7 12345678 jpg 12 34 56 78 123456789 jpg a a jpg b b jpg a ab jpg ab abc jpg dd ddd jpg af  f af ff jpg ab c abcd jpg ak k akkk jpg bf  f bf ff jpg ab c- d abc-de jpg ab cd e abcdef jpg ab cd ef abcdefg jpg ab cd ef g abcdefgh jpg ab cd ef gh abcdefghi jpg   As you can see for the  int-version every folder contains up to 1000 files and up to 99 directories containing 1000 files and 99 directories      But do not forget that to many directories cause the same performance problems   Finally you should think about how to reduce the amount of files in total  Depending on your target you can use CSS sprites to combine multiple tiny images like avatars  icons  smilies  etc  or if you use many small non-media files consider combining them e g  in JSON format  In my case I had thousands of mini-caches and finally I decided to combine them in packs of 10

User · Answer

Keep in mind that on Linux if you have a directory with too many files, the shell may not be able to expand wildcards. I have this issue with a photo album hosted on Linux. It stores all the resized images in a single directory. While the file system can handle many files, the shell can't. Example:

-shell-3.00$ ls A*
-shell: /bin/ls: Argument list too long

or

-shell-3.00$ chmod 644 *jpg
-shell: /bin/chmod: Argument list too long

User · Answer

If the time involved in implementing a directory partitioning scheme is minimal, I am in favor of it. The first time you have to debug a problem that involves manipulating a 10000-file directory via the console you will understand.

As an example, F-Spot stores photo files as YYYY\MM\DD\filename.ext, which means the largest directory I have had to deal with while manually manipulating my ~20000-photo collection is about 800 files. This also makes the files more easily browsable from a third party application. Never assume that your software is the only thing that will be accessing your software's files.

User · Answer

Keep in mind that on Linux if you have a directory with too many files, the shell may not be able to expand wildcards. I have this issue with a photo album hosted on Linux. It stores all the resized images in a single directory. While the file system can handle many files, the shell can't. Example:

-shell-3.00$ ls A*
-shell: /bin/ls: Argument list too long

or

-shell-3.00$ chmod 644 *jpg
-shell: /bin/chmod: Argument list too long

User · Answer

It absolutely depends on the filesystem  Many modern filesystems use decent data structures to store the contents of directories  but older filesystems often just added the entries to a list  so retrieving a file was an O n  operation   Even if the filesystem does it right  it s still absolutely possible for programs that list directory contents to mess up and do an O n 2  sort  so to be on the safe side  I d always limit the number of files per directory to no more than 500

User · Answer

I m working on a similar problem right now  We have a hierarchichal directory structure and use image ids as filenames  For example  an image with id 1234567 is placed in       45 67 1234567  lt     gt  jpg   using last 4 digits to determine where the file goes   With a few thousand images  you could use a one-level hierarchy  Our sysadmin suggested no more than couple of thousand files in any given directory  ext3  for efficiency   backup   whatever other reasons he had in mind

User · Answer

I ve been having the same issue  Trying to store millions of files in a Ubuntu server in ext4  Ended running my own benchmarks  Found out that flat directory performs way better while being way simpler to use     Wrote an article

User · Answer

I respect this doesn t totally answer your question as to how many is too many  but an idea for solving the long term problem is that in addition to storing the original file metadata  also store which folder on disk it is stored in - normalize out that piece of metadata  Once a folder grows beyond some limit you are comfortable with for performance  aesthetic or whatever reason  you just create a second folder and start dropping files there

User · Answer

It absolutely depends on the filesystem  Many modern filesystems use decent data structures to store the contents of directories  but older filesystems often just added the entries to a list  so retrieving a file was an O n  operation   Even if the filesystem does it right  it s still absolutely possible for programs that list directory contents to mess up and do an O n 2  sort  so to be on the safe side  I d always limit the number of files per directory to no more than 500

[filesystems] How many files can I put in a directory?

The answer is

Examples related to filesystems

Examples related to limit

Tags