[linux] Find files and tar them (with spaces)

Alright, so simple problem here. I'm working on a simple back up code. It works fine except if the files have spaces in them. This is how I'm finding files and adding them to a tar archive:

find . -type f | xargs tar -czvf backup.tar.gz 

The problem is when the file has a space in the name because tar thinks that it's a folder. Basically is there a way I can add quotes around the results from find? Or a different way to fix this?

This question is related to linux find backup tar

The answer is


Another solution as seen here:

find var/log/ -iname "anaconda.*" -exec tar -cvzf file.tar.gz {} +

Big warning on several of the solutions (and your own test) :

When you do : anything | xargs something

xargs will try to fit "as many arguments as possible" after "something", but then you may end up with multiple invocations of "something".

So your attempt: find ... | xargs tar czvf file.tgz may end up overwriting "file.tgz" at each invocation of "tar" by xargs, and you end up with only the last invocation! (the chosen solution uses a GNU -T special parameter to avoid the problem, but not everyone has that GNU tar available)

You could do instead:

find . -type f -print0 | xargs -0 tar -rvf backup.tar
gzip backup.tar

Proof of the problem on cygwin:

$ mkdir test
$ cd test
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs touch 
    # create the files
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs tar czvf archive.tgz
    # will invoke tar several time as it can'f fit 10000 long filenames into 1
$ tar tzvf archive.tgz | wc -l
60
    # in my own machine, I end up with only the 60 last filenames, 
    # as the last invocation of tar by xargs overwrote the previous one(s)

# proper way to invoke tar: with -r  (which append to an existing tar file, whereas c would overwrite it)
# caveat: you can't have it compressed (you can't add to a compressed archive)
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs tar rvf archive.tar #-r, and without z
$ gzip archive.tar
$ tar tzvf archive.tar.gz | wc -l
10000 
  # we have all our files, despite xargs making several invocations of the tar command

 

Note: that behavior of xargs is a well know diccifulty, and it is also why, when someone wants to do :

find .... | xargs grep "regex"

they intead have to write it:

find ..... | xargs grep "regex" /dev/null

That way, even if the last invocation of grep by xargs appends only 1 filename, grep sees at least 2 filenames (as each time it has: /dev/null, where it won't find anything, and the filename(s) appended by xargs after it) and thus will always display the file names when something maches "regex". Otherwise you may end up with the last results showing matches without a filename in front.


Would add a comment to @Steve Kehlet post but need 50 rep (RIP).

For anyone that has found this post through numerous googling, I found a way to not only find specific files given a time range, but also NOT include the relative paths OR whitespaces that would cause tarring errors. (THANK YOU SO MUCH STEVE.)

find . -name "*.pdf" -type f -mtime 0 -printf "%f\0" | tar -czvf /dir/zip.tar.gz --null -T -
  1. . relative directory

  2. -name "*.pdf" look for pdfs (or any file type)

  3. -type f type to look for is a file

  4. -mtime 0 look for files created in last 24 hours

  5. -printf "%f\0" Regular -print0 OR -printf "%f" did NOT work for me. From man pages:

This quoting is performed in the same way as for GNU ls. This is not the same quoting mechanism as the one used for -ls and -fls. If you are able to decide what format to use for the output of find then it is normally better to use '\0' as a terminator than to use newline, as file names can contain white space and newline characters.

  1. -czvf create archive, filter the archive through gzip , verbosely list files processed, archive name

Edit 2019-08-14: I would like to add, that I was also able to use essentially use the same command in my comment, just using tar itself:

tar -czvf /archiveDir/test.tar.gz --newer-mtime=0 --ignore-failed-read *.pdf

Needed --ignore-failed-read in-case there were no new PDFs for today.


If you have multiple files or directories and you want to zip them into independent *.gz file you can do this. Optional -type f -atime

find -name "httpd-log*.txt" -type f -mtime +1 -exec tar -vzcf {}.gz {} \;

This will compress

httpd-log01.txt
httpd-log02.txt

to

httpd-log01.txt.gz
httpd-log02.txt.gz

The best solution seem to be to create a file list and then archive files because you can use other sources and do something else with the list.

For example this allows using the list to calculate size of the files being archived:

#!/bin/sh

backupFileName="backup-big-$(date +"%Y%m%d-%H%M")"
backupRoot="/var/www"
backupOutPath=""

archivePath=$backupOutPath$backupFileName.tar.gz
listOfFilesPath=$backupOutPath$backupFileName.filelist

#
# Make a list of files/directories to archive
#
echo "" > $listOfFilesPath
echo "${backupRoot}/uploads" >> $listOfFilesPath
echo "${backupRoot}/extra/user/data" >> $listOfFilesPath
find "${backupRoot}/drupal_root/sites/" -name "files" -type d >> $listOfFilesPath

#
# Size calculation
#
sizeForProgress=`
cat $listOfFilesPath | while read nextFile;do
    if [ ! -z "$nextFile" ]; then
        du -sb "$nextFile"
    fi
done | awk '{size+=$1} END {print size}'
`

#
# Archive with progress
#
## simple with dump of all files currently archived
#tar -czvf $archivePath -T $listOfFilesPath
## progress bar
sizeForShow=$(($sizeForProgress/1024/1024))
echo -e "\nRunning backup [source files are $sizeForShow MiB]\n"
tar -cPp -T $listOfFilesPath | pv -s $sizeForProgress | gzip > $archivePath

Try running:

    find . -type f | xargs -d "\n" tar -czvf backup.tar.gz 

There could be another way to achieve what you want. Basically,

  1. Use the find command to output path to whatever files you're looking for. Redirect stdout to a filename of your choosing.
  2. Then tar with the -T option which allows it to take a list of file locations (the one you just created with find!)

    find . -name "*.whatever" > yourListOfFiles
    tar -cvf yourfile.tar -T yourListOfFiles
    

Why not give something like this a try: tar cvf scala.tar `find src -name *.scala`


Why not:

tar czvf backup.tar.gz *

Sure it's clever to use find and then xargs, but you're doing it the hard way.

Update: Porges has commented with a find-option that I think is a better answer than my answer, or the other one: find -print0 ... | xargs -0 ....


Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to find

Find a file by name in Visual Studio Code Explaining the 'find -mtime' command find files by extension, *.html under a folder in nodejs MongoDB Show all contents from all collections How can I find a file/directory that could be anywhere on linux command line? Get all files modified in last 30 days in a directory FileNotFoundError: [Errno 2] No such file or directory Linux find and grep command together find . -type f -exec chmod 644 {} ; Find all stored procedures that reference a specific column in some table

Examples related to backup

input file appears to be a text format dump. Please use psql How can I backup a Docker-container with its data-volumes? Backup/Restore a dockerized PostgreSQL database Export MySQL database using PHP only Tar a directory, but don't store full absolute paths in the archive How to extract or unpack an .ab file (Android Backup file) mysqldump with create database line Postgresql 9.2 pg_dump version mismatch How to backup Sql Database Programmatically in C# Opening a SQL Server .bak file (Not restoring!)

Examples related to tar

gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now Create a .tar.bz2 file Linux what does -zxvf mean in tar -zxvf <filename>? tar: file changed as we read it Create a tar.xz in one command How to tar certain file types in all subdirectories? Tar a directory, but don't store full absolute paths in the archive How to uncompress a tar.gz in another directory How to extract filename.tar.gz file Utilizing multi core for tar+gzip/bzip compression/decompression