I need to compress a large file of about 17-20 GB. I need to split it into several files of around 1GB per file.
I searched for a solution via Google and found ways using split
and cat
commands. But they did not work for large files at all. Also, they won't work in Windows; I need to extract it on a Windows machine.
This question is related to
linux
bash
file-io
compression
Tested code, initially creates a single archive file, then splits it:
gzip -c file.orig > file.gz
CHUNKSIZE=1073741824
PARTCNT=$[$(stat -c%s file.gz) / $CHUNKSIZE]
# the remainder is taken care of, for example for
# 1 GiB + 1 bytes PARTCNT is 1 and seq 0 $PARTCNT covers
# all of file
for n in `seq 0 $PARTCNT`
do
dd if=file.gz of=part.$n bs=$CHUNKSIZE skip=$n count=1
done
This variant omits creating a single archive file and goes straight to creating parts:
gzip -c file.orig |
( CHUNKSIZE=1073741824;
i=0;
while true; do
i=$[i+1];
head -c "$CHUNKSIZE" > "part.$i";
[ "$CHUNKSIZE" -eq $(stat -c%s "part.$i") ] || break;
done; )
In this variant, if the archive's file size is divisible by $CHUNKSIZE
, then the last partial file will have file size 0 bytes.
use tar to split into multiple archives
there are plenty of programs that will work with tar files on windows, including cygwin.
If you are splitting from Linux, you can still reassemble in Windows.
copy /b file1 + file2 + file3 + file4 filetogether
You can use the split
command with the -b
option:
split -b 1024m file.tar.gz
It can be reassembled on a Windows machine using @Joshua's answer.
copy /b file1 + file2 + file3 + file4 filetogether
Edit: As @Charlie stated in the comment below, you might want to set a prefix explicitly because it will use x
otherwise, which can be confusing.
split -b 1024m "file.tar.gz" "file.tar.gz.part-"
// Creates files: file.tar.gz.part-aa, file.tar.gz.part-ab, file.tar.gz.part-ac, ...
Edit: Editing the post because question is closed and the most effective solution is very close to the content of this answer:
# create archives
$ tar cz my_large_file_1 my_large_file_2 | split -b 1024MiB - myfiles_split.tgz_
# uncompress
$ cat myfiles_split.tgz_* | tar xz
This solution avoids the need to use an intermediate large file when (de)compressing. Use the tar -C option to use a different directory for the resulting files. btw if the archive consists from only a single file, tar could be avoided and only gzip used:
# create archives
$ gzip -c my_large_file | split -b 1024MiB - myfile_split.gz_
# uncompress
$ cat myfile_split.gz_* | gunzip -c > my_large_file
For windows you can download ported versions of the same commands or use cygwin.
Source: Stackoverflow.com