I have looked at this question and this question, but they do not seem to address the symptoms I am seeing.
I have a large log file (around 600 MB) that I am trying to transfer across a cellular network. Because it is a log file it is just appended to (although it is actually in an SQLite database with only INSERT being performed, so it isn't quite as simple as that, but with the exception of the last 4k page (or maybe a few) the file is identical each time. It is important that only the changes (and whatever checksums need to be transmitted) actually get sent, because the data connection is metered.
Yet when I perform a test across an unmetered connection (e.g. free wifi hotspot) I do not see an speed-up or reduced data transfer observed or reported. Over a slow WiFi connection I see on the order of 1MB/s or less, reporting that the transfer is going to take nearly 20 minutes. Over a fast WiFi connection I see a uniform faster speed, but no report of speedup, and a second attempt to transfer (which now should be faster because the two files are identical) does now show any difference.
The (sanitized to remove sensitive info) command I am using is:
rsync 'ssh -p 9999' --progress LogFile michael@my.host.zzz:/home/michael/logs/LogFile
The output I get at the end looks like this:
LogFile
640,856,064 100% 21.25MB/s 0:00:28 (xfr$1, to-chk=0/1)
There is no mention of any kind of speedup.
I suspect the problem may be one of the following:
- I am missing some command line option. However, re-reading the man page seems to suggest that delta transfers are enabled by default: I only see options for disabling them.
- I'm using rsync over ssh (on a non-standard port even) due to the server being behind a firewall that only allows ssh. I haven't seen anything explicitly saying delta transfers won't work if the rsync daemon isn't running, though. I tried using the "::" notation instead of ":" but the man page isn't very clear about what a "module" is, and my command is rejected for specifying an invalid module.
I have ruled out the following:
- delta transfers not performed on a local network. Ruled out because I am trying to perform the transfer across the internet
- overhead due to checksum calculation. I have seen this behavior both on a fast and slow Wifi connection and the transfer rate doesn't seem to be compute bound.
but with the exception of the last 4k page (or maybe a few) the file is identical each time.
Did you actually verify that withcmp
? Or better, withxdelta
or something? If you really want to minimize transfer size, keep the old and new versions locally, so you can compute a minimal binary diff locally (with something other than rsync) and just send that without having to send checksums over the metered connection. Doing this at the database-record level instead of the binary-file level is probably even better, like derobert suggests. – Peter Cordes Jul 26 '16 at 5:02rsync --stats
, and also-v -v
to get even more verbose stats. Rsync will tell you how much matched vs. unmatched data there was. – Peter Cordes Jul 26 '16 at 5:03