What is a Tarball? How do I extract it?

What is a Tarball? How do I extract it?


Author: W. Wade, Hampton
Email: whampton@staffnet.com

System Architecture: All/General
RedHat Release: All/General
FAQ Category: Common Problems with Linux/Unix Commands
Modification Date: Dec 22, 1998

Question:

RedHat and Caldera use RPM, but I have seen files named .tgz or .tar.gz on
the net (tarballs).  How do I use them?  What are they?

Answer:

A tarball is a (usually) compressed file that contains one
or more other files and was created using the UNIX/GNU 
tar program and (optionally) a compression program like gzip. 

Tar with compression is similar to the use of PKZIP (WinZip,
or ZIP or similar).  TAR is Tape ARchive and is designed to
group one or more files in an archive to a file or media.  Files are
archived with owner, permissions, path, etc.  For more info,
RTFM man page on tar.  Also note, CPIO (used internally by RPM)
is another archive format widely used (not addressed herein).

Unlike RPM, tar files do not contain any pre-install or post-install
scripts, dependency information, nor any other information like 
description, etc. (for example, use rpm --querytags to see a list
of what RPM can provide).

Linux/UNIX use several compression formats:
   GZIP      -- GNU ZIP (a fast compression similar to PKZIP)
   ZIP         -- PKZIP compatible compression (long file names)
   BZ/BZ2  -- BZIP and BZIP2 -- new, slower but generate smaller files
   Z             -- UNIZ compress (not as good nor as fast as GZIP)
   LZ           -- LZ compression (not widely used)

Standard suffixes (usually mapped to mime types):
   .gz   -- file is compressed with GZIP (gzip or gzip.exe)
   .tgz  -- tar file compressed with GZIP
   .Z     -- file if compressed with older UNIX compress
   .bz    -- file is compressed with bzip (new, replaced by bz2)
   .bz2  -- file is compressed with bzip2 (new, better than gzip/pkzip)
   .zip   -- file is compressed with zip (pkzip or compat., zip)
                  typically ZIP is not used with tar, but an archive 
                  may contain a tar file 
   .tar.gz or .tar.Z or .tar.bz or .tar.bz2  -- file is a tar file 
                  with compression (see suffixes above, e.g., gz)

File extraction:
   file.tar              -- tar file w/o compression
      tar tvf file.tar   -- test the file [ALWAYS DO FIRST]
      tar xvf file.tar   -- extract the file

   file.tar.gz or .tgz or .Z:
      tar tvzf file.tgz  -- test
      tar xvzf file.tgz  -- extract
      gunzip -cd file.tgz | tar xvf -    -- use with older 
                                            non-zip aware TAR
                                            (e.g., Solaris)
   file.bz:
      bzip -cd file.bz | tar xvf -

   file.bz2
      bzip2 -cd file.bz | tar xvf -

File creation:
   tar cvzf file.tgz [list of files to compress]
   tar cvf - [file list] | gzip >file.tgz
   tar cvf - [file list] |  bzip2 >file.tar.bz2

Use of TAR to copy a directory tree (with permissions, recursive, etc.):

[This should be a program like dircopy or xcopy]
   cd source_directory
   tar cvf - * |  ( cd dest_dir ; tar xvf - )


References:

man tar, man gzip, man gunzip, man cpio, man bzip, man bzip2, 
man zip, man unzip, man rpm