java get file size efficiently

Question

While googling  I see that using java io File length   can be slow  FileChannel has a size   method that is available as well   Is there an efficient way in java to get the file size

User · Accepted Answer

Well  I tried to measure it up with the code below   For runs   1 and iterations   1 the URL method is fastest most times followed by channel  I run this with some pause fresh about 10 times  So for one time access  using the URL is the fastest way I can think of   LENGTH sum  10626  per Iteration  10626 0  CHANNEL sum  5535  per Iteration  5535 0  URL sum  660  per Iteration  660 0   For runs   5 and iterations   50 the picture draws different   LENGTH sum  39496  per Iteration  157 984  CHANNEL sum  74261  per Iteration  297 044  URL sum  95534  per Iteration  382 136   File must be caching the calls to the filesystem  while channels and URL have some overhead   Code   import java io    import java net    import java util     public enum FileSizeBench        LENGTH            Override         public long getResult   throws Exception               File me   new File FileSizeBench class getResource                       FileSizeBench class   getFile                 return me length                         CHANNEL            Override         public long getResult   throws Exception               FileInputStream fis   null              try                   File me   new File FileSizeBench class getResource                           FileSizeBench class   getFile                     fis   new FileInputStream me                   return fis getChannel   size                  finally                   fis close                                       URL            Override         public long getResult   throws Exception               InputStream stream   null              try                   URL url   FileSizeBench class                          getResource  FileSizeBench class                    stream   url openStream                    return stream available                  finally                   stream close                                        public abstract long getResult   throws Exception       public static void main String   args  throws Exception           int runs   5          int iterations   50           EnumMap lt FileSizeBench  Long gt  durations   new EnumMap lt FileSizeBench  Long gt  FileSizeBench class            for  int i   0  i  lt  runs  i                  for  FileSizeBench test   values                      if   durations containsKey test                         durations put test  0l                                     long duration   testNow test  iterations                   durations put test  durations get test    duration                      System out println test     took      duration      per iteration        double duration    double iterations                                     for  Map Entry lt FileSizeBench  Long gt  entry   durations entrySet                  System out println                System out println entry getKey       sum      entry getValue        per Iteration        double entry getValue      double  runs   iterations                           private static long testNow FileSizeBench test  int iterations              throws Exception           long result   -1          long before   System nanoTime            for  int i   0  i  lt  iterations  i                  if  result    -1                    result   test getResult                      System out println result                 else if   result   test getResult       result                     throw new Exception  variance detected                                      return  System nanoTime   - before    1000

User · Answer

All the test cases in this post are flawed as they access the same file for each method tested  So disk caching kicks in which tests 2 and 3 benefit from   To prove my point I took test case provided by GHAD and changed the order of enumeration and below are the results    Looking at result I think File length   is the winner really   Order of test is the order of output  You can even see the time taken on my machine varied between executions but File Length   when not first  and incurring first disk access won   --- LENGTH sum  1163351  per Iteration  4653 404 CHANNEL sum  1094598  per Iteration  4378 392 URL sum  739691  per Iteration  2958 764  --- CHANNEL sum  845804  per Iteration  3383 216 URL sum  531334  per Iteration  2125 336 LENGTH sum  318413  per Iteration  1273 652  ---  URL sum  137368  per Iteration  549 472 LENGTH sum  18677  per Iteration  74 708 CHANNEL sum  142125  per Iteration  568 5

User · Answer

Actually  I think the  ls  may be faster   There are definitely some issues in Java dealing with getting File info   Unfortunately there is no equivalent safe method of recursive ls for Windows    cmd exe s DIR  S can get confused and generate errors in infinite loops   On XP  accessing a server on the LAN  it takes me 5 seconds in Windows to get the count of the files in a folder  33 000   and the total size   When I iterate recursively through this in Java  it takes me over 5 minutes   I started measuring the time it takes to do file length    file lastModified    and file toURI   and what I found is that 99  of my time is taken by those 3 calls   The 3 calls I actually need to do     The difference for 1000 files is 15ms local versus 1800ms on server   The server path scanning in Java is ridiculously slow   If the native OS can be fast at scanning that same folder  why can t Java   As a more complete test  I used WineMerge on XP to compare the modified date  and size of the files on the server versus the files locally   This was iterating over the entire directory tree of 33 000 files in each folder   Total time  7 seconds   java  over 5 minutes   So the original statement and question from the OP is true  and valid   Its less noticeable when dealing with a local file system   Doing a local compare of the folder with 33 000 items takes 3 seconds in WinMerge  and takes 32 seconds locally in Java   So again  java versus native is a 10x slowdown in these rudimentary tests   Java 1 6 0 22  latest   Gigabit LAN  and network connections  ping is less than 1ms  both in the same switch   Java is slow

User · Answer

If you want the file size of multiple files in a directory  use Files walkFileTree  You can obtain the size from the BasicFileAttributes that you ll receive    This is much faster then calling  length   on the result of File listFiles   or using Files size   on the result of Files newDirectoryStream    In my test cases it was about 100 times faster

User · Answer

I ran into this same issue   I needed to get the file size and modified date of 90 000 files on a network share   Using Java  and being as minimalistic as possible  it would take a very long time    I needed to get the URL from the file  and the path of the object as well   So its varied somewhat  but more than an hour    I then used a native Win32 executable  and did the same task  just dumping the file path  modified  and size to the console  and executed that from Java   The speed was amazing   The native process  and my string handling to read the data could process over 1000 items a second   So even though people down ranked the above comment  this is a valid solution  and did solve my issue   In my case I knew the folders I needed the sizes of ahead of time  and I could pass that in the command line to my win32 app   I went from hours to process a directory to minutes   The issue did also seem to be Windows specific   OS X did not have the same issue and could access network file info as fast as the OS could do so   Java File handling on Windows is terrible   Local disk access for files is fine though   It was just network shares that caused the terrible performance   Windows could get info on the network share and calculate the total size in under a minute too   --Ben

User · Answer

In response to rgrig s benchmark  the time taken to open close the FileChannel  amp  RandomAccessFile instances also needs to be taken into account  as these classes will open a stream for reading the file    After modifying the benchmark  I got these results for 1 iterations on a 85MB file   file totalTime  48000  48 us  raf totalTime  261000  261 us  channel totalTime  7020000  7 ms    For 10000 iterations on same file   file totalTime  80074000  80 ms  raf totalTime  295417000  295 ms  channel totalTime  368239000  368 ms    If all you need is the file size  file length   is the fastest way to do it  If you plan to use the file for other purposes like reading writing  then RAF seems to be a better bet  Just don t forget to close the file connection  -   import java io File  import java io FileInputStream  import java io RandomAccessFile  import java nio channels FileChannel  import java util HashMap  import java util Map   public class FileSizeBench           public static void main String   args  throws Exception               int iterations   1          String fileEntry   args 0            Map lt String  Long gt  times   new HashMap lt String  Long gt             times put  file   0L           times put  channel   0L           times put  raf   0L            long fileSize          long start          long end          File f1          FileChannel channel          RandomAccessFile raf           for  int i   0  i  lt  iterations  i                             file length               start   System nanoTime                f1   new File fileEntry               fileSize   f1 length                end   System nanoTime                times put  file   times get  file     end - start                   channel size               start   System nanoTime                channel   new FileInputStream fileEntry  getChannel                fileSize   channel size                channel close                end   System nanoTime                times put  channel   times get  channel     end - start                   raf length               start   System nanoTime                raf   new RandomAccessFile fileEntry   r                fileSize   raf length                raf close                end   System nanoTime                times put  raf   times get  raf     end - start                      for  Map Entry lt String  Long gt  entry   times entrySet                  System out println entry getKey       totalTime      entry getValue            getTime entry getValue                                 public static String getTime Long timeTaken                if  timeTaken  lt  1000                return timeTaken     ns             else if  timeTaken  lt   1000 1000                 return timeTaken 1000     us              else               return timeTaken  1000 1000      ms

User · Answer

When I modify your code to use a file accessed by an absolute path instead of a resource  I get a different result  for 1 run  1 iteration  and a 100 000 byte file -- times for a 10 byte file are identical to 100 000 bytes   LENGTH sum  33  per Iteration  33 0  CHANNEL sum  3626  per Iteration  3626 0  URL sum  294  per Iteration  294 0

User · Answer

The benchmark given by GHad measures lots of other stuff  such as reflection  instantiating objects  etc   besides getting the length  If we try to get rid of these things then for one call I get the following times in microseconds       file sum   19 0  per Iteration   19 0     raf sum   16 0  per Iteration   16 0 channel sum  273 0  per Iteration  273 0   For 100 runs and 10000 iterations I get       file sum  1767629 0  per Iteration  1 7676290000000001     raf sum   881284 0  per Iteration  0 8812840000000001 channel sum   414286 0  per Iteration  0 414286   I did run the following modified code giving as an argument the name of a 100MB file   import java io    import java nio channels    import java net    import java util     public class FileSizeBench      private static File file    private static FileChannel channel    private static RandomAccessFile raf     public static void main String   args  throws Exception       int runs   1      int iterations   1       file   new File args 0        channel   new FileInputStream args 0   getChannel        raf   new RandomAccessFile args 0    r         HashMap lt String  Double gt  times   new HashMap lt String  Double gt         times put  file   0 0       times put  channel   0 0       times put  raf   0 0        long start      for  int i   0  i  lt  runs    i          long l   file length           start   System nanoTime          for  int j   0  j  lt  iterations    j          if  l    file length    throw new Exception          times put  file   times get  file     System nanoTime   - start          start   System nanoTime          for  int j   0  j  lt  iterations    j          if  l    channel size    throw new Exception          times put  channel   times get  channel     System nanoTime   - start          start   System nanoTime          for  int j   0  j  lt  iterations    j          if  l    raf length    throw new Exception          times put  raf   times get  raf     System nanoTime   - start             for  Map Entry lt String  Double gt  entry   times entrySet              System out println              entry getKey       sum      1e-3   entry getValue                    per Iteration       1e-3   entry getValue     runs   iterations

User · Answer

From GHad s benchmark  there are a few issue people have mentioned   1 Like BalusC mentioned  stream available   is flowed in this case    Because available   returns an estimate of the number of bytes that can be read  or skipped over  from this input stream without blocking by the next invocation of a method for this input stream   So 1st to remove the URL this approach    2 As StuartH mentioned -  the order the test run also make the cache difference  so take that out by run the test separately     Now start test   When CHANNEL one run alone   CHANNEL sum  59691  per Iteration  238 764   When LENGTH one run alone   LENGTH sum  48268  per Iteration  193 072   So looks like the LENGTH one is the winner here    Override public long getResult   throws Exception       File me   new File FileSizeBench class getResource               FileSizeBench class   getFile         return me length

[java] java get file size efficiently

Examples related to java

Examples related to filesize