[linux] Split files using tar, gz, zip, or bzip2

I need to compress a large file of about 17-20 GB. I need to split it into several files of around 1GB per file.

I searched for a solution via Google and found ways using split and cat commands. But they did not work for large files at all. Also, they won't work in Windows; I need to extract it on a Windows machine.

This question is related to linux bash file-io compression

The answer is


Tested code, initially creates a single archive file, then splits it:

 gzip -c file.orig > file.gz
 CHUNKSIZE=1073741824
 PARTCNT=$[$(stat -c%s file.gz) / $CHUNKSIZE]

 # the remainder is taken care of, for example for
 # 1 GiB + 1 bytes PARTCNT is 1 and seq 0 $PARTCNT covers
 # all of file
 for n in `seq 0 $PARTCNT`
 do
       dd if=file.gz of=part.$n bs=$CHUNKSIZE skip=$n count=1
 done

This variant omits creating a single archive file and goes straight to creating parts:

gzip -c file.orig |
    ( CHUNKSIZE=1073741824;
        i=0;
        while true; do
            i=$[i+1];
            head -c "$CHUNKSIZE" > "part.$i";
            [ "$CHUNKSIZE" -eq $(stat -c%s "part.$i") ] || break;
        done; )

In this variant, if the archive's file size is divisible by $CHUNKSIZE, then the last partial file will have file size 0 bytes.


use tar to split into multiple archives

there are plenty of programs that will work with tar files on windows, including cygwin.


If you are splitting from Linux, you can still reassemble in Windows.

copy /b file1 + file2 + file3 + file4 filetogether

You can use the split command with the -b option:

split -b 1024m file.tar.gz

It can be reassembled on a Windows machine using @Joshua's answer.

copy /b file1 + file2 + file3 + file4 filetogether

Edit: As @Charlie stated in the comment below, you might want to set a prefix explicitly because it will use x otherwise, which can be confusing.

split -b 1024m "file.tar.gz" "file.tar.gz.part-"

// Creates files: file.tar.gz.part-aa, file.tar.gz.part-ab, file.tar.gz.part-ac, ...

Edit: Editing the post because question is closed and the most effective solution is very close to the content of this answer:

# create archives
$ tar cz my_large_file_1 my_large_file_2 | split -b 1024MiB - myfiles_split.tgz_
# uncompress
$ cat myfiles_split.tgz_* | tar xz

This solution avoids the need to use an intermediate large file when (de)compressing. Use the tar -C option to use a different directory for the resulting files. btw if the archive consists from only a single file, tar could be avoided and only gzip used:

# create archives
$ gzip -c my_large_file | split -b 1024MiB - myfile_split.gz_
# uncompress
$ cat myfile_split.gz_* | gunzip -c > my_large_file

For windows you can download ported versions of the same commands or use cygwin.


Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to bash

Comparing a variable with a string python not working when redirecting from bash script Zipping a file in bash fails How do I prevent Conda from activating the base environment by default? Get first line of a shell command's output Fixing a systemd service 203/EXEC failure (no such file or directory) /bin/sh: apt-get: not found VSCode Change Default Terminal Run bash command on jenkins pipeline How to check if the docker engine and a docker container are running? How to switch Python versions in Terminal?

Examples related to file-io

Python, Pandas : write content of DataFrame into text File Saving response from Requests to file How to while loop until the end of a file in Python without checking for empty line? Getting "java.nio.file.AccessDeniedException" when trying to write to a folder How do I add a resources folder to my Java project in Eclipse Read and write a String from text file Python Pandas: How to read only first n rows of CSV files in? Open files in 'rt' and 'wt' modes How to write to a file without overwriting current contents? Write objects into file with Node.js

Examples related to compression

Image steganography that could survive jpeg compression How to enable GZIP compression in IIS 7.5 Create a .tar.bz2 file Linux How are zlib, gzip and zip related? What do they have in common and how are they different? Create a tar.xz in one command Creating a ZIP archive in memory using System.IO.Compression How to compress an image via Javascript in the browser? How to reduce the image size without losing quality in PHP How to send a compressed archive that contains executables so that Google's attachment filter won't reject it How to reduce the image file size using PIL