[windows] Batch file to split .csv file

I have a very large .csv file (>500mb) and I wish to break this up into into smaller .csv files in command prompt. (Basically trying to find a linux "split" function in Windows".

This has to be a batch script as my machine only has windows installed and requesting softwares is a pain. I came across a number of sample codes (http://forums.techguy.org/software-development/1023949-split-100000-line-csv-into.html), however, it does not work when I execute the batch. All I get is one output file that is only 125kb when I requested it to parse every 20 000 lines.

Has anyone ever come across a similar problem and how did you resolve the issue?

This question is related to windows batch-file csv command-prompt

The answer is


Try this out:

@echo off
setLocal EnableDelayedExpansion

set limit=20000
set file=export.csv
set lineCounter=1
set filenameCounter=1

set name=
set extension=
for %%a in (%file%) do (
    set "name=%%~na"
    set "extension=%%~xa"
)

for /f "tokens=*" %%a in (%file%) do (
    set splitFile=!name!-part!filenameCounter!!extension!
    if !lineCounter! gtr !limit! (
        set /a filenameCounter=!filenameCounter! + 1
        set lineCounter=1
        echo Created !splitFile!.
    )
    echo %%a>> !splitFile!

    set /a lineCounter=!lineCounter! + 1
)

As shown in the code above, it will split the original csv file into multiple csv file with a limit of 20 000 lines. All you have to do is to change the !file! and !limit! variable accordingly. Hope it helps.


I found this question while looking for a similar solution. I modified the answer that @Dale gave to suit my purposes. I wanted something that was a little more flexible and had some error trapping. Just thought I might put it here for anyone looking for the same thing.

@echo off
setLocal EnableDelayedExpansion
GOTO checkvars

:checkvars
    IF "%1"=="" GOTO syntaxerror
    IF NOT "%1"=="-f"  GOTO syntaxerror
    IF %2=="" GOTO syntaxerror
    IF NOT EXIST %2 GOTO nofile
    IF "%3"=="" GOTO syntaxerror
    IF NOT "%3"=="-n" GOTO syntaxerror
    IF "%4"==""  GOTO syntaxerror
    set param=%4
    echo %param%| findstr /xr "[1-9][0-9]* 0" >nul && (
        goto proceed
    ) || (
        echo %param% is NOT a valid number
        goto syntaxerror
    )

:proceed
    set limit=%4
    set file=%2
    set lineCounter=1+%limit%
    set filenameCounter=0

    set name=
    set extension=

    for %%a in (%file%) do (
        set "name=%%~na"
        set "extension=%%~xa"
    )

    for /f "usebackq tokens=*" %%a in (%file%) do (
        if !lineCounter! gtr !limit! (
            set splitFile=!name!_part!filenameCounter!!extension!
            set /a filenameCounter=!filenameCounter! + 1
            set lineCounter=1
            echo Created !splitFile!.
        )
        cls
        echo Adding Line !splitFile! - !lineCounter!
        echo %%a>> !splitFile!
        set /a lineCounter=!lineCounter! + 1
    )
    echo Done!
    goto end
:syntaxerror
    Echo Syntax: %0 -f Filename -n "Number Of Rows Per File"
    goto end
:nofile
    echo %2 does not exist
    goto end
:end

This will give you lines 1 to 20000 in newfile1.csv
and lines 20001 to the end in file newfile2.csv

It overcomes the 8K character limit per line too.

This uses a helper batch file called findrepl.bat from - https://www.dropbox.com/s/rfdldmcb6vwi9xc/findrepl.bat

Place findrepl.bat in the same folder as the batch file or on the path.

It's more robust than a plain batch file, and quicker too.

findrepl /o:1:20000 <file.csv >newfile1.csv
findrepl /o:20001   <file.csv >newfile2.csv


If splitting very large files, the solution I found is an adaptation from this, with PowerShell "embedded" in a batch file. This works fast, as opposed to many other things I tried (I wouldn't know about other options posted here).

The way to use mysplit.bat below is

mysplit.bat <mysize> 'myfile'

Note: The script was intended to use the first argument as the split size. It is currently hardcoded at 100Mb. It should not be difficult to fix this.

Note 2: The filname should be enclosed in single quotes. Other alternatives for quoting apparently do not work.

Note 3: It splits the file at given number of bytes, not at given number of lines. For me this was good enough. Some lines of code could be probably added to complete each chunk read, up to the next CR/LF. This will split in full lines (not with a constant number of them), with no sacrifice in processing time.

Script mysplit.bat:

@REM Using https://stackoverflow.com/questions/19335004/how-to-run-a-powershell-script-from-a-batch-file
@REM and https://stackoverflow.com/questions/1001776/how-can-i-split-a-text-file-using-powershell
@PowerShell  ^
    $upperBound = 100MB;  ^
    $rootName = %2;  ^
    $from = $rootName;  ^
    $fromFile = [io.file]::OpenRead($from);  ^
    $buff = new-object byte[] $upperBound;  ^
    $count = $idx = 0;  ^
    try {  ^
        do {  ^
            'Reading ' + $upperBound;  ^
            $count = $fromFile.Read($buff, 0, $buff.Length);  ^
            if ($count -gt 0) {  ^
                $to = '{0}.{1}' -f ($rootName, $idx);  ^
                $toFile = [io.file]::OpenWrite($to);  ^
                try {  ^
                    'Writing ' + $count + ' to ' + $to;  ^
                    $tofile.Write($buff, 0, $count);  ^
                } finally {  ^
                    $tofile.Close();  ^
                }  ^
            }  ^
            $idx ++;  ^
        } while ($count -gt 0);  ^
    }  ^
    finally {  ^
        $fromFile.Close();  ^
    }  ^
%End PowerShell%

Use the cgwin command SPLIT. Samples

To split a file every 500 lines counts:

split -l 500 [filename.ext]

by default, it adds xa,xb,xc... to filename after extension

To generate files with numbers and ending in correct extension, use following

split -l 1000 sourcefilename.ext destinationfilename -d --additional-suffix=.ext

the position of -d or -l does not matter,

  • "-d" is same as --numeric-suffixes
  • "-l" is same as --lines

For more: split --help


Examples related to windows

"Permission Denied" trying to run Python on Windows 10 A fatal error occurred while creating a TLS client credential. The internal error state is 10013 How to install OpenJDK 11 on Windows? I can't install pyaudio on Windows? How to solve "error: Microsoft Visual C++ 14.0 is required."? git clone: Authentication failed for <URL> How to avoid the "Windows Defender SmartScreen prevented an unrecognized app from starting warning" XCOPY: Overwrite all without prompt in BATCH Laravel 5 show ErrorException file_put_contents failed to open stream: No such file or directory how to open Jupyter notebook in chrome on windows Tensorflow import error: No module named 'tensorflow'

Examples related to batch-file

'ls' is not recognized as an internal or external command, operable program or batch file '' is not recognized as an internal or external command, operable program or batch file XCOPY: Overwrite all without prompt in BATCH CanĀ“t run .bat file under windows 10 Execute a batch file on a remote PC using a batch file on local PC Windows batch - concatenate multiple text files into one How do I create a shortcut via command-line in Windows? Getting Error:JRE_HOME variable is not defined correctly when trying to run startup.bat of Apache-Tomcat Curl not recognized as an internal or external command, operable program or batch file Best way to script remote SSH commands in Batch (Windows)

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to command-prompt

CMD command to check connected USB devices How do I kill the process currently using a port on localhost in Windows? PowerShell The term is not recognized as cmdlet function script file or operable program open program minimized via command prompt How to connect to SQL Server from command prompt with Windows authentication How to see the proxy settings on windows? How do I type a TAB character in PowerShell? Batch file to split .csv file Aliases in Windows command prompt Change all files and folders permissions of a directory to 644/755