Linux backups: Using find, xargs, and tar to create a huge archive

I did something wrong in a previous blog entry that led me to use the pax command to create a large backup/archive. There’s nothing wrong with using pax — other than the fact that it’s not available for Cygwin — and I really needed to created a huge archive.

What wasn’t working

In my earlier blog post I stated that something like this did not work for me when trying to create a large backup using find, xargs, and tar:

find . -type f -name "*.java" | xargs tar cvf myfile.tar

What was happening was that as xargs was managing the input to the tar command, tar kept re-writing the archive. That is, each time xargs passed a new block of input files to tar, tar perceived it as a new command, and went on to re-create the file named myfile.tar. So, instead of the huge myfile.tar that I expected, I ended up with only a few files in the archive.

The solution

This problem is easily remedied if you use the -r argument with tar instead of the -c argument. The -r option tells tar to append to the archive, while -c says “create.”

Given that background, I can say that this command worked just fine for me to create a very large tar archive:

find . -type f -name "*.java" | xargs tar rvf myfile.tar

This combination of find, tar, and xargs worked like a champ for me.

Update

I originally wrote this article in 2004 for Cygwin (wow!), and I just tried this approach again by accident in 2019 on MacOS, and I noticed that the -c option worked on a little 20K tar/gzip archive. So without researching it any more, I’ll guess that either (a) the -c option works okay on small archives, or (b) the problem I had in 2004 was specific to Cygwin.