How are zlib gzip and zip related What do they have in common and how are they different

Question

The compression algorithm used in zlib is essentially the same as that in gzip and zip  What are gzip and zip  How are they different and how are they same

User · Answer

The most important difference is that gzip is only capable to compress a single file while zip compresses multiple files one by one and archives them into one single file afterwards. Thus, gzip comes along with tar most of the time (there are other possibilities, though). This comes along with some (dis)advantages.

If you have a big archive and you only need one single file out of it, you have to decompress the whole gzip file to get to that file. This is not required if you have a zip file.

On the other hand, if you compress 10 similiar or even identical files, the zip archive will be much bigger because each file is compressed individually, whereas in gzip in combination with tar a single file is compressed which is much more effective if the files are similiar (equal).

User · Answer

ZIP is a file format used for storing an arbitrary number of files and folders together with lossless compression  It makes no strict assumptions about the compression methods used  but is most frequently used with DEFLATE   Gzip is both a compression algorithm based on DEFLATE but less encumbered with potential patents et al  and a file format for storing a single compressed file  It supports compressing an arbitrary number of files and folders when combined with tar  The resulting file has an extension of  tgz or  tar gz and is commonly called a tarball   zlib is a library of functions encapsulating DEFLATE in its most common LZ77 incarnation

User · Answer

Short form   zip is an archive format using  usually  the Deflate compression method   The  gz gzip format is for single files  also using the Deflate compression method   Often gzip is used in combination with tar to make a compressed archive format   tar gz   The zlib library provides Deflate compression and decompression code for use by zip  gzip  png  which uses the zlib wrapper on deflate data   and many other applications  Long form  The ZIP format was developed by Phil Katz as an open format with an open specification  where his implementation  PKZIP  was shareware   It is an archive format that stores files and their directory structure  where each file is individually compressed   The file type is  zip   The files  as well as the directory structure  can optionally be encrypted  The ZIP format supports several compression methods      0 - The file is stored  no compression      1 - The file is Shrunk     2 - The file is Reduced with compression factor 1     3 - The file is Reduced with compression factor 2     4 - The file is Reduced with compression factor 3     5 - The file is Reduced with compression factor 4     6 - The file is Imploded     7 - Reserved for Tokenizing compression algorithm     8 - The file is Deflated     9 - Enhanced Deflating using Deflate64 tm     10 - PKWARE Data Compression Library Imploding  old IBM TERSE     11 - Reserved by PKWARE    12 - File is compressed using BZIP2 algorithm    13 - Reserved by PKWARE    14 - LZMA    15 - Reserved by PKWARE    16 - IBM z OS CMPSC Compression    17 - Reserved by PKWARE    18 - File is compressed using IBM TERSE  new     19 - IBM LZ77 z Architecture     20 - deprecated  use method 93 for zstd     93 - Zstandard  zstd  Compression     94 - MP3 Compression     95 - XZ Compression     96 - JPEG variant    97 - WavPack compressed data    98 - PPMd version I  Rev 1    99 - AE-x encryption marker  see APPENDIX E   Methods 1 to 7 are historical and are not in use   Methods 9 through 98 are relatively recent additions and are in varying  small amounts of use   The only method in truly widespread use in the ZIP format is method 8  Deflate  and to some smaller extent method 0  which is no compression at all   Virtually every  zip file that you will come across in the wild will use exclusively methods 8 and 0  likely just method 8    Method 8 also has a means to effectively store the data with no compression and relatively little expansion  and Method 0 cannot be streamed whereas Method 8 can be   The ISO IEC 21320-1 2015 standard for file containers is a restricted zip format  such as used in Java archive files   jar   Office Open XML files  Microsoft Office  docx   xlsx   pptx   Office Document Format files   odt   ods   odp   and EPUB files   epub   That standard limits the compression methods to 0 and 8  as well as other constraints such as no encryption or signatures  Around 1990  the Info-ZIP group wrote portable  free  open-source implementations of zip and unzip utilities  supporting compression with the Deflate format  and decompression of that and the earlier formats   This greatly expanded the use of the  zip format  In the early  90s  the gzip format was developed as a replacement for the Unix compress utility  derived from the Deflate code in the Info-ZIP utilities   Unix compress was designed to compress a single file or stream  appending a  Z to the file name   compress uses the LZW compression algorithm  which at the time was under patent and its free use was in dispute by the patent holders   Though some specific implementations of Deflate were patented by Phil Katz  the format was not  and so it was possible to write a Deflate implementation that did not infringe on any patents   That implementation has not been so challenged in the last 20  years   The Unix gzip utility was intended as a drop-in replacement for compress  and in fact is able to decompress compress-compressed data  assuming that you were able to parse that sentence    gzip appends a  gz to the file name   gzip uses the Deflate compressed data format  which compresses quite a bit better than Unix compress  has very fast decompression  and adds a CRC-32 as an integrity check for the data   The header format also permits the storage of more information than the compress format allowed  such as the original file name and the file modification time  Though compress only compresses a single file  it was common to use the tar utility to create an archive of files  their attributes  and their directory structure into a single  tar file  and to then compress it with compress to make a  tar Z file   In fact  the tar utility had and still has an option to do the compression at the same time  instead of having to pipe the output of tar to compress   This all carried forward to the gzip format  and tar has an option to compress directly to the  tar gz format   The tar gz format compresses better than the  zip approach  since the compression of a  tar can take advantage of redundancy across files  especially many small files    tar gz is the most common archive format in use on Unix due to its very high portability  but there are more effective compression methods in use as well  so you will often see  tar bz2 and  tar xz archives  Unlike  tar   zip has a central directory at the end  which provides a list of the contents  That and the separate compression provides random access to the individual entries in a  zip file  A  tar file would have to be decompressed and scanned from start to end in order to build a directory  which is how a  tar file is listed  Shortly after the introduction of gzip  around the mid-1990s  the same patent dispute called into question the free use of the  gif image format  very widely used on bulletin boards and the World Wide Web  a new thing at the time    So a small group created the PNG losslessly compressed image format  with file type  png  to replace  gif   That format also uses the Deflate format for compression  which is applied after filters on the image data expose more of the redundancy   In order to promote widespread usage of the PNG format  two free code libraries were created   libpng and zlib   libpng handled all of the features of the PNG format  and zlib provided the compression and decompression code for use by libpng  as well as for other applications   zlib was adapted from the gzip code  All of the mentioned patents have since expired  The zlib library supports Deflate compression and decompression  and three kinds of wrapping around the deflate streams   Those are  no wrapping at all   quot raw quot  deflate   zlib wrapping  which is used in the PNG format data blocks  and gzip wrapping  to provide gzip routines for the programmer   The main difference between zlib and gzip wrapping is that the zlib wrapping is more compact  six bytes vs  a minimum of 18 bytes for gzip  and the integrity check  Adler-32  runs faster than the CRC-32 that gzip uses   Raw deflate is used by programs that read and write the  zip format  which is another format that wraps around deflate compressed data  zlib is now in wide use for data transmission and storage   For example  most HTTP transactions by servers and browsers compress and decompress the data using zlib  specifically HTTP header Content-Encoding  deflate means deflate compression method wrapped inside the zlib data format  Different implementations of deflate can result in different compressed output for the same input data  as evidenced by the existence of selectable compression levels that allow trading off compression effectiveness for CPU time  zlib and PKZIP are not the only implementations of deflate compression and decompression  Both the 7-Zip archiving utility and Google s zopfli library have the ability to use much more CPU time than zlib in order to squeeze out the last few bits possible when using the deflate format  reducing compressed sizes by a few percent as compared to zlib s highest compression level  The pigz utility  a parallel implementation of gzip  includes the option to use zlib  compression levels 1-9  or zopfli  compression level 11   and somewhat mitigates the time impact of using zopfli by splitting the compression of large files over multiple processors and cores

[compression] How are zlib, gzip and zip related? What do they have in common and how are they different?

Examples related to compression

Examples related to zip

Examples related to gzip

Examples related to zlib