What is a superfast way to read large files line-by-line in VBA

Question

I believe I have come up with a very efficient way to read very  very large files line-by-line   Please tell me if you know of a better faster way or see room for improvement  I am trying to get better at coding  so any sort of advice you have would be nice  Hopefully this is something that other people might find useful  too   It appears to be something like 8 times faster than using Line Input from my tests    This function reads a file into a string                            I found this in the book Programming Excel with VBA and  NET       Public Function QuickRead FName As String  As String     Dim I As Integer     Dim res As String     Dim l As Long      I   FreeFile     l   FileLen FName      res   Space l      Open FName For Binary Access Read As  I     Get  I    res     Close I     QuickRead   res End Function   This function works like the Line Input statement  Public Sub QRLineInput        ByRef strFileData As String        ByRef lngFilePosition As Long        ByRef strOutputString        ByRef blnEOF As Boolean             On Error GoTo LastLine     strOutputString   Mid  strFileData  lngFilePosition            InStr lngFilePosition  strFileData  vbNewLine  - lngFilePosition      lngFilePosition   InStr lngFilePosition  strFileData  vbNewLine    2     Exit Sub LastLine      blnEOF   True End Sub  Sub Test       Dim strFilePathName As String  strFilePathName    C  Fld File txt      Dim strFile As String     Dim lngPos As Long     Dim blnEOF As Boolean     Dim strFileLine As String      strFile   QuickRead strFilePathName   amp  vbNewLine     lngPos   1      Do Until blnEOF         Call QRLineInput strFile  lngPos  strFileLine  blnEOF      Loop End Sub   Thanks for the advice

User · Answer

Be careful when using Application Transpose with a huge number of values  If you transpose values to a column  excel will assume you are assuming you transposed them from rows      Max Column Limit  lt  Max Row Limit  and it will only display the first  Max Column Limit  values  and anithing after that will be  N A

User · Answer

I just wanted to share some of my results       I have text files  which apparently came from a Linux system  so I only have a vbLF Chr 10  at the end of each line and not vbCR Chr 13       Note 1           This meant that the Line Input method would read in the entire file  instead of just one line at a time       From my research testing small  152KB   amp  large  2778LB  files  both on and off the network I found the following   Open FileName For Input  Line Input was the slowest  See Note 1 above   Open FileName For Binary Access Read  Input was the fastest for reading the whole file  FSO OpenTextFile  ReadLine was fast  but a bit slower then Binary Input     Note 2          If I just needed to check the file header  first 1-2 lines  to check if I had the proper file format  then FSO OpenTextFile was the   fastest  followed very closely by Binary Input      The drawback with the Binary Input is that you have to know how many characters   you want to read       On normal files  Line Input would also be a good   option as well  but I couldn t test due to Note 1               Note 3           Obviously  the files on the network showed the largest difference in read  speed   They also showed the greatest benefit from reading the file a second time  although there are certainly memory buffers that come into play here

User · Answer

You can use Scripting FileSystemObject to do that thing   From the Reference      The ReadLine method allows a script to read individual lines in a text file  To use this method  open the text file  and then set up a Do Loop that continues until the AtEndOfStream property is True   This simply means that you have reached the end of the file   Within the Do Loop  call the ReadLine method  store the contents of the first line in a variable  and then perform some action  When the script loops around  it will automatically drop down a line and read the second line of the file into the variable  This will continue until each line has been read  or until the script specifically exits the loop     And a quick example   Set objFSO   CreateObject  Scripting FileSystemObject   Set objFile   objFSO OpenTextFile  C  FSO ServerList txt   1  Do Until objFile AtEndOfStream  strLine   objFile ReadLine  MsgBox strLine Loop objFile Close

User · Answer

you can modify above and read full file in one go and then display each line as shown below  Option Explicit  Public Function QuickRead FName As String  As Variant     Dim i As Integer     Dim res As String     Dim l As Long     Dim v As Variant      i   FreeFile     l   FileLen FName      res   Space l      Open FName For Binary Access Read As  i     Get  i    res     Close i      split the file with vbcrlf     QuickRead   Split res  vbCrLf  End Function  Sub Test         you can replace file for  c  writename txt to any file name you desire     Dim strFilePathName As String  strFilePathName    C  writename txt      Dim strFileLine As String     Dim v As Variant     Dim i As Long     v   QuickRead strFilePathName      For i   0 To UBound v          MsgBox v i      Next End Sub

User · Answer

My two cents     Not long ago I needed reading large files using VBA and noticed this question   I tested the three approaches to read data from a file to compare its speed and reliability for a wide range of file sizes and line lengths   The approaches are    Line Input VBA statement Using the File System Object  FSO  Using Get VBA statement for the whole file and then parsing the string read as described in posts here   Each test case consists of three steps    Test case setup that writes a text file containing given number of lines of the same given length filled by the known character pattern  Integrity test   Read each file line and verify its length and contents  File read speed test   Read each line of the file repeated 10 times    As you can notice  Step  3 verifies the true file read speed  as asked in the question  while Step  2 verifies the file read integrity and therefore simulates real conditions when string parsing is needed   The following chart shows the test results for the File read speed test   The file size is 64M bytes for all tests  and the tests differ in line length that varies from 2 bytes  not including CRLF  to 8M bytes     CONCLUSION    All the three methods are reliable for large files with normal and abnormal line lengths  please compare to Graeme Howard   s answer  All the three methods produce almost equivalent file reading speed for normal line lengths    Superfast way     Method  3  works fine for extremely long lines while the other two don   t  All this is applicable to different Offices  different PCs  for VBA and VB6

User · Answer

With that code you load the file in memory  as a big string  and then you read that string line by line   By using Mid    and InStr   you actually read the  file  twice but since it s in memory  there is no problem  I don t know if VB s String has a length limit  probably not  but if the text files are hundreds of megabyte in size it s likely to see a performance drop  due to virtual memory usage

User · Answer

My take on it   obviously  you ve got to do something with the data you read in   If it involves writing it to the sheet  that ll be deadly slow with a normal For Loop   I came up with the following based upon a rehash of some of the items there  plus some help from the Chip Pearson website   Reading in the text file  assuming you don t know the length of the range it will create  so only the startingCell is given    Public Sub ReadInPlainText startCell As Range  Optional textfilename As Variant      If IsMissing textfilename  Then textfilename   Application GetOpenFilename  All Files                 Select Text File to Read      If textfilename      Then Exit Sub     Dim filelength As Long    Dim filenumber As Integer    filenumber   FreeFile    filelength   filelen textfilename     Dim text As String    Dim textlines As Variant     Open textfilename For Binary Access Read As filenumber     text   Space filelength     Get  filenumber    text      split the file with vbcrlf    textlines   Split text  vbCrLf        output to range    Dim outputRange As Range    Set outputRange   startCell    Set outputRange   outputRange Resize UBound textlines   1     outputRange Value   Application Transpose textlines      Close filenumber  End Sub   Conversely  if you need to write out a range to a text file  this does it quickly in one print statement  note  the file  Open  type here is in text mode  not binary  unlike the read routine above    Public Sub WriteRangeAsPlainText ExportRange As Range  Optional textfilename As Variant     If IsMissing textfilename  Then textfilename   Application GetSaveAsFilename FileFilter   Text Files    txt     txt      If textfilename      Then Exit Sub     Dim filenumber As Integer    filenumber   FreeFile    Open textfilename For Output As filenumber     Dim textlines   As Variant  outputvar As Variant     textlines   Application Transpose ExportRange Value     outputvar   Join textlines  vbCrLf     Print  filenumber  outputvar    Close filenumber End Sub

User · Answer

Line Input works fine for small files   However  when file sizes reach around 90k  Line Input jumps all over the place and reads data in the wrong order from the source file  I tested it with different filesizes   49k   ok 60k   ok 78k   ok 85k   ok 93k   error 101k   error 127k   error 156k   error   Lesson learned - use Scripting FileSystemObject

User · Answer

I would think   in a large file scenario using a stream would be far more efficient  because memory consumption would be very small     But your algorithm could alternate between using a stream and loading the entire thing in memory based on the file size   I wouldn t be surprised if one is only better than the other under certain criteria

[vba] What is a superfast way to read large files line-by-line in VBA?

Examples related to vba

Examples related to file-io