Tips for Transferring Large Numbers of Files

There's an added overhead in moving large numbers of files from one computer to another. If the source and destination are similar enough in configuration and you have time to spare, you might simply use rsync or a raw file copy.

There are other -- faster -- ways, particularly if you're trying to archive those files or are in a hurry.

A significant part of the overhead is caused by the need to manage every single file in the collection. That can be reduced by archiving the files: in most cases this will reduce the size as well. Many files compress well: text approximately 85%, and repetitive log files by as much as 98%.

With this technique, try to avoid directories full of images or similarly-compressed files: these should be archived with no compression. A zip setting exists for this, and plain vanilla tar works wonders.

Rzip is an excellent tool in general for large filesets containing common files if you have the time (it is rather slow) and the disk space.

Subversion working copies (which contain pristine copies of the checked-out files) compress fairly well, but Rzip can compress them further than gzip or bzip2, looking for long-distance redundancies.

As the site for rzip points out, there are environments where multiple copies of the same file exist but no reasonable faculty exists to remove them:

It is quite common these days to need to compress files that contain long distance redundancies. For example, when compressing a set of home directories several users might have copies of the same file, or of quite similar files. It is also common to have a single file that contains large duplicated chunks over long distances, such as pdf files containing repeated copies of the same image.

In order to get the benefits of multiple techniques, it's often necessary to split the files to be moved up into large groups. Don't be afraid to do this, but human time is valuable too: at some point there are diminishing returns which could be better spent deciding on just one technique.

Look for ways to combine things, too. For example, home directories where there are subversion working copies are a wonderful target for rsync.

In summary:

  • Transmitting fewer files is better
  • It often takes less time to compress a file and transmit the compressed version than to transfer the uncompressed version
  • If you have time, run rzip on specific sets of files where there is a large distance between duplications
  • Don't waste your time: just come up with something that's approximately correct... don't worry about making things ideal.

Comments

Post new comment

All comment submissions must follow the Comment Policy. Your words remain your own and you are responsible for them. If you don't like the captcha, Login to a user account. You can login with OpenID too..
The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <img> <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <embed> <blockquote> <p> <iframe> <div> <span> <tt>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
p
b
Y
9
j
Z
v
Enter the code without spaces and pay attention to upper/lower case.