(cache)Rsync and sparse files

Sparse files are a great feature of Linux filesystems. They become very handy when working with virtualization technologies like KVM. You don’t need to think long on how big you make a VM disk, just create a disk which is definitely big enough (I’m using 20GB normally for my linux based servers). If only 1GB is used the file uses only this amount of physical disk space and not the whole 20GB.

QEmu creates sparse files already by default when using raw images.
Example: qemu-img create myserver.img 20G
When adding the “s” option to the ls command you see the real used size in the first column.

ls -lhs
realsize                           virtualsize
0 -rw-r--r-- 1 gergap gergap 20.0G Aug 10 11:27 myserver.img

However these sparse files are a problem when copying them, especially when you need to move a disk image to another machine over network.

Local copies: When copying files locally with tools that are not aware of sparse files the whole 20GB will be copied. It may sound strange, but that’s the desired behavior. A sparse file with 20GB should look like a normal file to applications, so they see to complete 20GB, even though the most data is just zeros.

Luckily the “cp” command is aware of sparse files and will autodetect if a source is a sparse file. Then also the copy will become a sparse file and only the real data gets copied which is much faster. If the source is not sparse you can use “cp --sparse=always source dest“, then the destination will become a sparse file.

Now lets come to network transfer. Most admins are using rsync, which can copy a lot of files very quickly over SSH. rsync is very efficient in detecting what files have changed and only transmits the files that have been changed. So it’s easy to keep e.g. an FTP mirror in sync with its source or to implement backup strategies.

KVM images are different. You don’t have many files, but the files you have are huge sparse files. You don’t want to transmit 20GB over network if only a few MB have changed in the disk image. Even transmitting 1GB of actually used data takes quite a long time.

The solution is to use the “--inplace” option of rsync. This option only transmits the changed blocks of a file, not the whole file. The problem with “--inplace” is that is does not create sparse files.

But rsync can handle sparse files when passing the “--sparse” option. Unfortunately “--sparse” and “--inplace” cannot be used together.

Solution: When copying the file the first time, which means it does not exist on the target server use “rsync --sparse“. This will create a sparse file on the target server and copies only the used data of the sparse file.

When the file already exists on the target server and you only want to update it use “rsync --inplace“. This will only transmit the changed blocks and can also append to the existing sparse file.

I hope rsync will become more smart in the future and allows the combination of “--inplace --sparse” or can even autodetect the best strategy. But for now we have at least a working solution.

I hope this blog was helpful for understanding sparse files and rsync.

12 Responses to “Rsync and sparse files”

Feed for this Entry Trackback Address

1 pidegatwp February 27, 2014 at 1:05 pm

the dash dash option are not well formatted, take care ! this is --inplace or --sparse

- 2 gergap February 28, 2014 at 8:59 am
  
  Thx, I fixed the formatting.
  
3 dannyman August 20, 2015 at 12:38 am

Thank you! This information is also consistent with information on serverfault, but better explained here.

http://serverfault.com/questions/66338/how-do-you-synchronise-huge-sparse-files-vm-disk-images-between-machines

4 Anonymous May 17, 2016 at 8:01 pm

Thanks! Thought something was wrong when I ran out of disk space moving images to a *larger* box.

5 Daniel December 4, 2016 at 1:47 pm

I’m not sure why, but in my setup I tried to transfer lxc-based sparse files via NFS between two storages.

Step 1: VM is still running. Do a first transfer, using –sparse
Step 2: Verifying. The target file came out as a sparse file as intended!

Step 3: VM stopped. Do a sync of any changes that might have occurred in the meantime. This is done, as you suggested, using –inplace
Step 4: Verifying. The target file now suddenly is not a sparse file any more and takes up 100% of the size.

Any other people around here that deal with the same issue?
(Setup: Linux 4.4.13, rsync 3.1.1)

* /)/)

- 6 gergap December 4, 2016 at 3:43 pm
  
  Hi Daniel,
  
  thx for that hint.
  I haven’t checked this yet, but this is possible.
  The purpose of my explanation was not to save disk space, but speed up copying.
  The “–sparse” options avoids copying zero over the network in the first fresh copy.
  The “–inplace” just transmits changed blocks.
  If “–inplace” makes the file non-sparse I didn’t recognize this so far, but I also didn’t care.
  I normally have enough disk space and the aim was just speeding up things not saving disk space.
  
  You can always make the file a sparse file again locally by using “cp –sparse=always “.
  
  I’ll check this ASAP to see if I can reproduce your issue.
  
  regards,
  Gerhard
  
7 Paul May 23, 2017 at 9:57 am

Follow-up note for you: “cp” on HP-UX does not handle sparse files and doesn’t have any option to do so

And, unfortunately, rsync still can’t support both –inplace and –sparse either, which causes no end of problems when ISAM database files (full of holes) get rebuilt or ‘stretched’ (to add more logical size/space). rsync “see’s” the rebuilt file and –inplace then picks it up, copies over and fills-in all the holes (we have regular –inplace rsync setup to another HP-UX server and I then have to delete the filled-in file from target server and manually rsync again using –sparse before the next –implace job starts).

8 jacobrue June 21, 2017 at 12:38 am

I am so sorry my article is in Danish, however i made some quick comparisons between the different ways of transferring kvm-images across the network … and nfs + cp won big time!

A file which scp og rsync transferred in more than 10 minutes took only 1:30 through nfs

http://www.specialhosting.dk/kvm-overforsel-af-images-sparse-files-mellem-servere/ (i am sure Google Translate will make it readable)

- 9 gergap June 21, 2017 at 7:32 am
  
  Hi. This is because you ar using rsync wirh ssh which encrypts all the traffic. With the rsync protocol this is much faster. Have a look at my other post: https://gergap.wordpress.com/2013/08/13/optimizing-speed-in-kvm-image-synchronization-using-rsync/

1 Optimizing speed in KVM image synchronization using rsync | Gergap's Weblog Trackback on August 13, 2013 at 11:42 am
2 Rsync, cp and sparse files | marinov.biz Trackback on November 27, 2013 at 3:02 pm
3 rsync « Trackback on July 9, 2018 at 12:55 am

Gergap’s Weblog

Rsync and sparse files

12 Responses to “Rsync and sparse files”

Leave a Reply Cancel reply

my tweets

Gergap’s Weblog

Rsync and sparse files

Share this:

Like this:

Related

12 Responses to “Rsync and sparse files”

Leave a Reply Cancel reply

my tweets