The title is slightly misleading, but it is true as long as the original source does not change.  I’ve also made a copy of an entire rawhide tree (~13G) in less than five seconds.

I’ve been syncing rawhide trees at home for a while and have a fairly old machine (Mac G4 400 MHz PPC running F10) that I use for file serving, http installs, unison, backups, yum repos, proxying, irssi, etc.  It only has 140G of disc space.  I could buy a larger drive, but I’ve found this is enough space to serve the files that I need and it forces me to keep things clean by throwing away files that are old or not used.  The G4 also makes for a great headless always on machine because according to my Kill A Watt meter it only consumes about 40 watts of electricity.

I was looking for a way to store multiple days of rawhide trees, knowing that usually only part of the tree changes each day and not wanting to duplicate a lot of data.  Inspired by a script that Jesse Keating wrote do to this I started exploring rsync and the fantastic --link-dest= argument.

This technique is also super handy if you want to do a quick backup of a massive amount of data and want to do it really fast–incurring space consumption only if the source changes later.

rsync -av --link-dest=/home/bozo/source /home/bozo/source  /home/bozo/target

The command above creates new hard links from the source to the target.  One tricky thing about the --link-dest= argument is that you must provide the full path.  Also, because hard links are involved the source and target must be on the same file systems.

As packages are removed from the source directory the target remains in tact.  All that has happened is that the link count for the changed file has been reduced. As long as the link count is greater than one the file remains. More on how this works.

This same technique works well for constructing other local Fedora trees with lots of similar files.  If the file is available locally, rsync creates a hard link instead of downloading it.

Update  (2009-01-15): Having given a little more thought to Chris Tyler’s important clarification, maybe I’ve oversold this concept as a good backup mechanism when really it should be more focused on its benefits specifcally for managing trees or directories containing binaries or packages.  In other words, this approach is best for copying or linking content that does not get modified or is completely replaced by a newer version of the file.