How to write super-fast file-streaming code in C

Question

I have to split a huge file into many smaller files  Each of the destination files is defined by an offset and length as the number of bytes  I m using the following code   private void copy string srcFile  string dstFile  int offset  int length        BinaryReader reader   new BinaryReader File OpenRead srcFile        reader BaseStream Seek offset  SeekOrigin Begin       byte   buffer   reader ReadBytes length        BinaryWriter writer   new BinaryWriter File OpenWrite dstFile        writer Write buffer       Considering that I have to call this function about 100 000 times  it is remarkably slow    Is there a way to make the Writer connected directly to the Reader   That is  without actually loading the contents into the Buffer in memory

User · Answer

The first thing I would recommend is to take measurements. Where are you losing your time? Is it in the read, or the write?

Over 100,000 accesses (sum the times): How much time is spent allocating the buffer array? How much time is spent opening the file for read (is it the same file every time?) How much time is spent in read and write operations?

If you aren't doing any type of transformation on the file, do you need a BinaryWriter, or can you use a filestream for writes? (try it, do you get identical output? does it save time?)

User · Answer

How large is length  You may do better to re-use a fixed sized  moderately large  but not obscene  buffer  and forget BinaryReader    just use Stream Read and Stream Write    edit  something like   private static void copy string srcFile  string dstFile  int offset       int length  byte   buffer        using Stream inStream   File OpenRead srcFile       using  Stream outStream   File OpenWrite dstFile                 inStream Seek offset  SeekOrigin Begin           int bufferLength   buffer Length  bytesRead          while  length  gt  bufferLength  amp  amp               bytesRead   inStream Read buffer  0  bufferLength    gt  0                        outStream Write buffer  0  bytesRead               length -  bytesRead                    while  length  gt  0  amp  amp               bytesRead   inStream Read buffer  0  length    gt  0                        outStream Write buffer  0  bytesRead               length -  bytesRead

User · Answer

Have you considered using the CCR since you are writing to separate files you can do everything in parallel  read and write  and the CCR makes it very easy to do this   static void Main string   args                Dispatcher dp   new Dispatcher            DispatcherQueue dq   new DispatcherQueue  DQ   dp            Port lt long gt  offsetPort   new Port lt long gt              Arbiter Activate dq  Arbiter Receive lt long gt  true  offsetPort              new Handler lt long gt  Split              FileStream fs   File Open file path  FileMode Open           long size   fs Length          fs Dispose             for  long i   0  i  lt  size  i    split size                        offsetPort Post i                        private static void Split long offset                FileStream reader   new FileStream file path  FileMode Open               FileAccess Read           reader Seek offset  SeekOrigin Begin           long toRead   0          if  offset   split size  lt   reader Length              toRead   split size          else             toRead   reader Length - offset           byte   buff   new byte toRead           reader Read buff  0   int toRead           reader Dispose            File WriteAllBytes  c   out    offset     txt   buff           This code posts offsets to a CCR port which causes a Thread to be created to execute the code in the Split method  This causes you to open the file multiple times but gets rid of the need for synchronization  You can make it more memory efficient but you ll have to sacrifice speed

User · Answer

The fastest way to do file I O from C  is to use the Windows ReadFile and WriteFile functions   I have written a C  class that encapsulates this capability as well as a benchmarking program that looks at differnet I O methods  including BinaryReader and BinaryWriter  See my blog post at   http   designingefficientsoftware wordpress com 2011 03 03 efficient-file-io-from-csharp

User · Answer

No one suggests threading   Writing the smaller files looks like text book example of where threads are useful   Set up a bunch of threads to create the smaller files  this way  you can create them all in parallel and you don t need to wait for each one to finish   My assumption is that creating the files disk operation  will take WAY longer than splitting up the data   and of course you should verify first that a sequential approach is not adequate

User · Answer

You shouldn t re-open the source file each time you do a copy  better open it once and pass the resulting BinaryReader to the copy function  Also  it might help if you order your seeks  so you don t make big jumps inside the file   If the lengths aren t too big  you can also try to group several copy calls by grouping offsets that are near to each other and reading the whole block you need for them  for example   offset   1234  length   34 offset   1300  length   40 offset   1350  length   1000   can be grouped to one read   offset   1234  length   1074   Then you only have to  seek  in your buffer and can write the three new files from there without having to read again

User · Answer

Using FileStream   StreamWriter I know it s possible to create massive files in little time  less than 1 min 30 seconds   I generate three files totaling 700  megabytes from one file using that technique   Your primary problem with the code you re using is that you are opening a file every time  That is creating file I O overhead   If you knew the names of the files you would be generating ahead of time  you could extract the File OpenWrite into a separate method  it will increase the speed  Without seeing the code that determines how you are splitting the files  I don t think you can get much faster

User · Answer

For future reference    Quite possibly the fastest way to do this would be to use memory mapped files  so primarily copying memory  and the OS handling the file reads writes via its paging memory management    Memory Mapped files are supported in managed code in  NET 4 0   But as noted  you need to profile  and expect to switch to native code for maximum performance

User · Answer

I don t believe there s anything within  NET to allow copying a section of a file without buffering it in memory  However  it strikes me that this is inefficient anyway  as it needs to open the input file and seek many times  If you re just splitting up the file  why not open the input file once  and then just write something like   public static void CopySection Stream input  string targetFile  int length        byte   buffer   new byte 8192        using  Stream output   File OpenWrite targetFile                 int bytesRead   1             This will finish silently if we couldn t read  length  bytes             An alternative would be to throw an exception         while  length  gt  0  amp  amp  bytesRead  gt  0                        bytesRead   input Read buffer  0  Math Min length  buffer Length                output Write buffer  0  bytesRead               length -  bytesRead                      This has a minor inefficiency in creating a buffer on each invocation - you might want to create the buffer once and pass that into the method as well   public static void CopySection Stream input  string targetFile                                 int length  byte   buffer        using  Stream output   File OpenWrite targetFile                 int bytesRead   1             This will finish silently if we couldn t read  length  bytes             An alternative would be to throw an exception         while  length  gt  0  amp  amp  bytesRead  gt  0                        bytesRead   input Read buffer  0  Math Min length  buffer Length                output Write buffer  0  bytesRead               length -  bytesRead                      Note that this also closes the output stream  due to the using statement  which your original code didn t   The important point is that this will use the operating system file buffering more efficiently  because you reuse the same input stream  instead of reopening the file at the beginning and then seeking   I think it ll be significantly faster  but obviously you ll need to try it to see     This assumes contiguous chunks  of course  If you need to skip bits of the file  you can do that from outside the method  Also  if you re writing very small files  you may want to optimise for that situation too - the easiest way to do that would probably be to introduce a BufferedStream wrapping the input stream

[c#] How to write super-fast file-streaming code in C#?

Examples related to c#

Examples related to performance

Examples related to streaming

Examples related to cpu

Examples related to utilization