[windows] How to split large text file in windows?

I have a log file with size of 2.5 GB. Is there any way to split this file into smaller files using windows command prompt?

This question is related to windows text cmd split size

The answer is


Below code split file every 500

@echo off
setlocal ENABLEDELAYEDEXPANSION
REM Edit this value to change the name of the file that needs splitting. Include the extension.
SET BFN=upload.txt
REM Edit this value to change the number of lines per file.
SET LPF=15000
REM Edit this value to change the name of each short file. It will be followed by a number indicating where it is in the list.
SET SFN=SplitFile

REM Do not change beyond this line.

SET SFX=%BFN:~-3%

SET /A LineNum=0
SET /A FileNum=1

For /F "delims==" %%l in (%BFN%) Do (
SET /A LineNum+=1

echo %%l >> %SFN%!FileNum!.%SFX%

if !LineNum! EQU !LPF! (
SET /A LineNum=0
SET /A FileNum+=1
)

)
endlocal
Pause

See below: https://forums.techguy.org/threads/solved-split-a-100000-line-csv-into-5000-line-csv-files-with-dos-batch.1023949/


you can split using a third party software http://www.hjsplit.org/, for example give yours input that could be upto 9GB and then split, in my case I split 10 MB each enter image description here


Of course there is! Win CMD can do a lot more than just split text files :)

Split a text file into separate files of 'max' lines each:

Split text file (max lines each):
: Initialize
set input=file.txt
set max=10000

set /a line=1 >nul
set /a file=1 >nul
set out=!file!_%input%
set /a max+=1 >nul

echo Number of lines in %input%:
find /c /v "" < %input%

: Split file
for /f "tokens=* delims=[" %i in ('type "%input%" ^| find /v /n ""') do (

if !line!==%max% (
set /a line=1 >nul
set /a file+=1 >nul
set out=!file!_%input%
echo Writing file: !out!
)

REM Write next file
set a=%i
set a=!a:*]=]!
echo:!a:~1!>>out!
set /a line+=1 >nul
)

If above code hangs or crashes, this example code splits files faster (by writing data to intermediate files instead of keeping everything in memory):

eg. To split a file with 7,600 lines into smaller files of maximum 3000 lines.

  1. Generate regexp string/pattern files with set command to be fed to /g flag of findstr

list1.txt

\[[0-9]\]
\[[0-9][0-9]\]
\[[0-9][0-9][0-9]\]
\[[0-2][0-9][0-9][0-9]\]

list2.txt

\[[3-5][0-9][0-9][0-9]\]

list3.txt

\[[6-9][0-9][0-9][0-9]\]

  1. Split the file into smaller files:
type "%input%" | find /v /n "" | findstr /b /r /g:list1.txt > file1.txt
type "%input%" | find /v /n "" | findstr /b /r /g:list2.txt > file2.txt
type "%input%" | find /v /n "" | findstr /b /r /g:list3.txt > file3.txt
  1. remove prefixed line numbers for each file split:
    eg. for the 1st file:
for /f "tokens=* delims=[" %i in ('type "%cd%\file1.txt"') do (
set a=%i
set a=!a:*]=]!
echo:!a:~1!>>file_1.txt)

Notes:
Works with leading whitespace, blank lines & whitespace lines.

Tested on Win 10 x64 CMD, on 4.4GB text file, 5651982 lines.


Set Arg = WScript.Arguments
set WshShell = createObject("Wscript.Shell")
Set Inp = WScript.Stdin
Set Outp = Wscript.Stdout
    Set rs = CreateObject("ADODB.Recordset")
    With rs
        .Fields.Append "LineNumber", 4 

        .Fields.Append "Txt", 201, 5000 
        .Open
        LineCount = 0
        Do Until Inp.AtEndOfStream
            LineCount = LineCount + 1
            .AddNew
            .Fields("LineNumber").value = LineCount
            .Fields("Txt").value = Inp.readline
            .UpDate
        Loop

        .Sort = "LineNumber ASC"

        If LCase(Arg(1)) = "t" then
            If LCase(Arg(2)) = "i" then
                .filter = "LineNumber < " & LCase(Arg(3)) + 1
            ElseIf LCase(Arg(2)) = "x" then
                .filter = "LineNumber > " & LCase(Arg(3))
            End If
        ElseIf LCase(Arg(1)) = "b" then
            If LCase(Arg(2)) = "i" then
                .filter = "LineNumber > " & LineCount - LCase(Arg(3))
            ElseIf LCase(Arg(2)) = "x" then
                .filter = "LineNumber < " & LineCount - LCase(Arg(3)) + 1
            End If
        End If

        Do While not .EOF
            Outp.writeline .Fields("Txt").Value

            .MoveNext
        Loop
    End With

Cut

filter cut {t|b} {i|x} NumOfLines

Cuts the number of lines from the top or bottom of file.

t - top of the file
b - bottom of the file
i - include n lines
x - exclude n lines

Example

cscript /nologo filter.vbs cut t i 5 < "%systemroot%\win.ini"

Another way This outputs lines 5001+, adapt for your use. This uses almost no memory.

Do Until Inp.AtEndOfStream
         Count = Count + 1
         If count > 5000 then
            OutP.WriteLine Inp.Readline
         End If
Loop

You can use the command split for this task. For example this command entered into the command prompt

split YourLogFile.txt -b 500m

creates several files with a size of 500 MByte each. This will take several minutes for a file of your size. You can rename the output files (by default called "xaa", "xab",... and so on) to *.txt to open it in the editor of your choice.

Make sure to check the help file for the command. You can also split the log file by number of lines or change the name of your output files.

(tested on Windows 7 64 bit)


Examples related to windows

"Permission Denied" trying to run Python on Windows 10 A fatal error occurred while creating a TLS client credential. The internal error state is 10013 How to install OpenJDK 11 on Windows? I can't install pyaudio on Windows? How to solve "error: Microsoft Visual C++ 14.0 is required."? git clone: Authentication failed for <URL> How to avoid the "Windows Defender SmartScreen prevented an unrecognized app from starting warning" XCOPY: Overwrite all without prompt in BATCH Laravel 5 show ErrorException file_put_contents failed to open stream: No such file or directory how to open Jupyter notebook in chrome on windows Tensorflow import error: No module named 'tensorflow'

Examples related to text

Difference between opening a file in binary vs text How do I center text vertically and horizontally in Flutter? How to `wget` a list of URLs in a text file? Convert txt to csv python script Reading local text file into a JavaScript array Python: How to increase/reduce the fontsize of x and y tick labels? How can I insert a line break into a <Text> component in React Native? How to split large text file in windows? Copy text from nano editor to shell Atom menu is missing. How do I re-enable

Examples related to cmd

'ls' is not recognized as an internal or external command, operable program or batch file '' is not recognized as an internal or external command, operable program or batch file XCOPY: Overwrite all without prompt in BATCH VSCode Change Default Terminal How to install pandas from pip on windows cmd? 'ls' in CMD on Windows is not recognized Command to run a .bat file VMware Workstation and Device/Credential Guard are not compatible How do I kill the process currently using a port on localhost in Windows? how to run python files in windows command prompt?

Examples related to split

Parameter "stratify" from method "train_test_split" (scikit Learn) Pandas split DataFrame by column value How to split large text file in windows? Attribute Error: 'list' object has no attribute 'split' Split function in oracle to comma separated values with automatic sequence How would I get everything before a : in a string Python Split String by delimiter position using oracle SQL JavaScript split String with white space Split a String into an array in Swift? Split pandas dataframe in two if it has more than 10 rows

Examples related to size

PySpark 2.0 The size or shape of a DataFrame How to set label size in Bootstrap How to create a DataFrame of random integers with Pandas? How to split large text file in windows? How can I get the size of an std::vector as an int? How to load specific image from assets with Swift How to find integer array size in java Fit website background image to screen size How to set text size in a button in html How to change font size in html?