Utilizing multi core for tar gzip bzip compression decompression

Question

I normally compress using tar zcvf and decompress using tar zxvf  using gzip due to habit     I ve recently gotten a quad core CPU with hyperthreading  so I have 8 logical cores  and I notice that many of the cores are unused during compression decompression    Is there any way I can utilize the unused cores to make it faster

User · Answer

Common approach  There is option for tar program   -I  --use-compress-program PROG       filter through PROG  must accept -d    You can use multithread version of archiver or compressor utility   Most popular multithread archivers are pigz  instead of gzip  and pbzip2  instead of bzip2   For instance     tar -I pbzip2 -cf OUTPUT FILE tar bz2 paths to archive   tar --use-compress-program pigz -cf OUTPUT FILE tar gz paths to archive   Archiver must accept -d  If your replacement utility hasn t this parameter and or you need specify additional parameters  then use pipes  add parameters if necessary      tar cf - paths to archive   pbzip2  gt  OUTPUT FILE tar gz   tar cf - paths to archive   pigz  gt  OUTPUT FILE tar gz   Input and output of singlethread and multithread are compatible  You can compress using multithread version and decompress using singlethread version and vice versa   p7zip  For p7zip for compression you need a small shell script like the following      bin sh case  1 in   -d  7za -txz -si -so e         7za -txz -si -so a     esac 2 gt  dev null   Save it as 7zhelper sh  Here the example of usage     tar -I 7zhelper sh -cf OUTPUT FILE tar 7z paths to archive   tar -I 7zhelper sh -xf OUTPUT FILE tar 7z   xz  Regarding multithreaded XZ support  If you are running version 5 2 0 or above of XZ Utils  you can utilize multiple cores for compression by setting -T or --threads to an appropriate value via the environmental variable XZ DEFAULTS  e g  XZ DEFAULTS  -T 0     This is a fragment of man for 5 1 0alpha version      Multithreaded  compression and decompression are not implemented yet  so this   option has no effect for now    However this will not work for decompression of files that haven t also  been compressed with threading enabled  From man for version 5 2 2      Threaded decompression hasn t been implemented yet   It will only work   on files that  contain  multiple  blocks  with size  information  in   block headers   All files compressed in multi-threaded mode meet this   condition  but files compressed in single-threaded mode don t even if   --block-size size is used    Recompiling with replacement  If you build tar from sources  then you can recompile with parameters  --with-gzip pigz --with-bzip2 lbzip2 --with-lzip plzip   After recompiling tar with these options you can check the output of tar s help     tar --help   grep  lbzip2  plzip  pigz    -j  --bzip2                filter the archive through lbzip2       --lzip                 filter the archive through plzip   -z  --gzip  --gunzip  --ungzip   filter the archive through pigz

User · Answer

If you want to have more flexibility with filenames and compression options  you can use   find  my path  -type f -name    sql  -o -name    log  -exec   tar -P --transform  s  my path   g  -cf -          pigz -9 -p 4  gt  myarchive tar gz   Step 1  find  find  my path  -type f -name    sql  -o -name    log  -exec  This command will look for the files you want to archive  in this case  my path   sql and  my path   log  Add as many -o -name  pattern  as you want   -exec will execute the next command using the results of find  tar  Step 2  tar  tar -P --transform  s  my path   g  -cf -       --transform is a simple string replacement parameter  It will strip the path of the files from the archive so the tarball s root becomes the current directory when extracting  Note that you can t use -C option to change directory as you ll lose benefits of find  all files of the directory would be included   -P tells tar to use absolute paths  so it doesn t trigger the warning  Removing leading     from member names   Leading     with be removed by --transform anyway   -cf - tells tar to use the tarball name we ll specify later       uses everyfiles that find found previously  Step 3  pigz  pigz -9 -p 4  Use as many parameters as you want  In this case -9 is the compression level and -p 4 is the number of cores dedicated to compression  If you run this on a heavy loaded webserver  you probably don t want to use all available cores   Step 4  archive name   gt  myarchive tar gz  Finally

User · Answer

You can use the shortcut -I for tar s --use-compress-program switch  and invoke pbzip2 for bzip2 compression on multiple cores   tar -I pbzip2 -cf OUTPUT FILE tar bz2 DIRECTORY TO COMPRESS

User · Answer

A relatively newer  de compression tool you might want to consider is zstandard  It does an excellent job of utilizing spare cores  and it has made some great trade-offs when it comes to compression ratio vs   de compression time  It is also highly tweak-able depending on your compression ratio needs

User · Answer

You can also use the tar flag  --use-compress-program   to tell tar what compression program to use    For example use    tar -c --use-compress-program pigz -f tar file dir to zip

User · Answer

You can use pigz instead of gzip  which does gzip compression on multiple cores   Instead of using the -z option  you would pipe it through pigz   tar cf - paths-to-archive   pigz  gt  archive tar gz   By default  pigz uses the number of available cores  or eight if it could not query that   You can ask for more with -p n  e g  -p 32   pigz has the same options as gzip  so you can request better compression with -9   E g   tar cf - paths-to-archive   pigz -9 -p 32  gt  archive tar gz

[gzip] Utilizing multi core for tar+gzip/bzip compression/decompression

Common approach

p7zip

xz

Recompiling with replacement

Examples related to gzip

Examples related to tar

Examples related to bzip2

Examples related to bzip