|
|
|
(April 2009)
In an adjoining article, I have described
the techniques I use to perform a daily backup of an overseas
Linux server - in a very optimal way.
Windows machines are a different story, though.
An ideal Windows backup strategy must...
- be fast and hassle-free
- be executed automatically (e.g. on a daily basis)
- use only open-source and free Microsoft tools (i.e.
avoid closed-source third-party software)
- allow instantly navigateable history for as many
days/weeks/months as storage permits, with only the modified and new
files actually reserving space on the backup device
- work just fine with open files (registry hive, SQL server data,
Outlook mailboxes, etc)
- use cheap hardware
I searched for something like this and found many solutions that had some
of the desired features... but none that had them all.
Challenges
Backup where?
Tapes and optical disks are not ideal backup storage - they would require manual
fiddling each time (change tape, insert DVD/BlueRay disk, etc). We want the process
to be completely automated, we want to set it up and forget about it.
Hard drives (e.g. external USB drives) and network shares fit this
profile. In fact, since we intend to use techniques that only store
the modified/new data, we can start with enough storage for e.g. twice
the amount of our windows drive: a 500GB external USB drive will be able to
store anywhere from hundreds to thousands of daily snapshots of a 250GB Windows
machine. Why? Because we rarely change more than 1GB of our data on a single
day (depending on disk usage patterns of course - don't go searching for
extreme scenarios). To see a more technical Linux-based example of the
inner workings of what we'll do, read the explanation
of using hard links and rsync from my Linux backup page - or just take my
word for it :-)
Instantly navigateable
Windows XP Professional are equipped with NTBackup. I am told that
this works fine, and I don't doubt it. However, I want to be able to
access my backed-up data directly, and not through yet another GUI.
I want to be able to open last week's version of VeryImportantDocument.xls
just by browsing with Explorer to that day's backup directory, and
double-clicking on it. Rsync and filesystem hard-links provide the necessary
functionality for this, so why would I use yet another application to
"extract" my old version? Why do I need to decide about "full backups" and
"incremental backups"? I want all my backed-up data accessible, all the time. And I can.
Your file is being used by another process... Sorry...
In sharp contrast to UNIX - where I have never seen any applications/filesystems
enforcing draconian read/write access policies - there are a lot
of files under Windows whose contents are simply not accessible:
(start a CMD prompt from an Administrative account)
C:\> cd c:\windows\system32\config
C:\WINDOWS\system32\config> copy SAM c:\
The process cannot access the file because it is being used by another process.
0 file(s) copied.
|
The filesystem doesn't let us - these files were opened with exclusive
access modes. The developers who built the relevant applications knew
that these files adhere to binary formats (i.e. registry hive,
SQL Server files, etc), and since there is no guarantee that these files
are in a consistent state, they don't want us to read them. What we would
read would be useless anyway... yet another reason why UNIX, with its ASCII-based configuration
files under /etc is much better than the registry - and in the same vein,
Thunderbird, with its plain ASCII-based mbox format is much better
than the cryptic Outlook PSTs. I digress... (it's tough not to, when
you see this kind of things).
So how do we back these files up? There are very important files
included in the "forbidden fruit" category... the registry hive, the SQL Server files, Outlook's PSTs - i.e.
your mail! (unless you are wise enough to use Thunderbird, which has
no such issues) - etc. Leaving these files out of the backup is simply
not an option. To cover this requirement, Microsoft took a page out of
LVM snapshots and
introduced with Windows XP the Volume Shadow Copy services.
In plain words, they developed the necessary drivers and services that
allow a process to take a "frozen picture" of the filesystem, and use
that frozen picture for whatever reason - backup applications being the
primary clients of this feature. To cope with the fact that some
applications would not tolerate the inconsistent state of the files
when snapshot, the Volume Shadow services include the necessary
work-arounds: asking the appropriate applications to do a sort of "commit",
basically, before actually taking the snapshot.
Unfortunately, the Volume Shadow Copy left again something to
be desired: there is no way for normal processes to access these "shadow"
volumes, since they are not visible via normal drive letters (they
are low-level devices, e.g. \\?\Volume{785cc4a6-3d68-11d7-9cc5-505054503030}).
Thanks to Adi Oltean,
however, there is a method that involves using Microsoft tools
(vshadow and dosdev) that allow us to create these
snapshots and give them a normal drive letter - after which, we can use
the usual rsync-based snapshots to back everything up!
Rsync under Windows
The current Cygwin (version 1.5) has some issues: It doesn't handle
UTF-8 filenames correctly, and it also has issues with very long
filenames/paths. Cygwin 1.7 is going to fix these issues for all
applications; and in the meantime, thanks to the efforts of
Michael Wallner,
we have an rsync that utilizes the appropriate Windows APIs to
access any files, regardless of filename/path lengths and characters used.
Thanks, Michael!
Permissions
When backing up our files, we have to decide what to do with their
permissions: we can of course invoke rsync with the "-a" parameter,
to try and save as much of them as possible, but this isn't necessarily
a wise thing: when we "rotate" the backup directories, we first remove
the oldest one - but if we save the original permissions of the files,
we won't be able to remove some (e.g. the Windows system folders, marked
as read-only) - which will break the backup process. For my
needs (I backup to an external USB drive that I alone can access),
I invoke rsync with the --chmod=ugo=rwX which basically
makes everything accessible for everyone.
The complete solution
As with all problems, there is no magic bullet that covers everyone's requirements.
And this is where the UNIX philosophy shines: understanding the simple tools that
do one job - and do it well - and then "glueing" them together with scripting to
cover our specific needs.
In this case, I'll show you how I use an external USB drive to perform a daily backup
of my Windows machine at work. The process however can be modified in many
ways: e.g. to backup to a network share (an OpenSolaris/ZFS Samba share
would be perfect: just remember to rsync --inplace and snapshot
the results via ZFS snapshots) or to create directories named after
the day of the backup, etc. What follows is a very simple - yet fully functional - usage
scenario of the appropriate tools.
Download my Windows XP backup package (I used
the open source 7-zip archiver,
which compresses much better than anything else right now). Uncompress
it in e.g. c:\Backup, and let's have a look inside:
C:\> cd Backup
C:\Backup> dir
Volume in drive C has no label.
Volume Serial Number is XXXX-XXXX
Directory of C:\Backup
15/04/2009 06:10 pm <DIR> .
15/04/2009 06:10 pm <DIR> ..
24/04/2008 12:51 pm 1.157.632 cygcrypto-0.9.8.dll \
23/10/2006 01:44 am 999.424 cygiconv-2.dll |
20/11/2005 04:13 am 31.744 cygintl-3.dll | Cygwin DLLs
23/10/2006 02:23 am 31.744 cygintl-8.dll |=> for
09/06/2002 07:50 am 22.528 cygpopt-0.dll | rsync.exe
22/05/2008 09:02 pm 2.329.849 cygwin1.dll |
16/10/2006 03:10 am 66.048 cygz.dll /
28/09/2004 02:07 pm 6.656 dosdev.exe => Microsoft tool
15/04/2009 01:52 pm 62 mybackup.cmd
23/05/2008 09:52 pm 915.896 rsync.exe => Cygwin tool
01/11/2006 02:05 pm 150.328 sync.exe => Microsoft tool
08/06/2005 03:17 pm 294.912 vshadow.exe => Microsoft tool
08/06/2005 03:17 pm 352.256 vshadow2003andMaybeVista.exe => MS tool
15/04/2009 12:39 pm 1.219 vss-exec.cmd
18 File(s) 6.639.134 bytes
2 Dir(s) 80.913.649.664 bytes free
|
So we have a set of Microsoft and Cygwin tools, and two scripts.
The backup process starts with mybackup.cmd:
C:\Backup> type mybackup.cmd
@echo off
echo Creating backup directories on F:\Backups if missing
if not exist F:\Backups mkdir F:\Backups
for %%p in (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) do if
not exist F:\Backups\%%p mkdir F:\Backups\%%p
sync
vshadow.exe -script=vss-setvar.cmd -exec=vss-exec.cmd C:
|
First, we make sure the backup directories exist.
We then invoke the sync command from Microsoft Sysinternals,
which flushes all filesystem buffers to the disks (just in case
something goes bad - Windows do have blue screens, you know :-))
We then invoke vshadow.exe to create a shadow volume
copy of the C: drive (if you are backing up a different drive,
change this).
If you don't use Windows XP?
If you have Windows 2003 or Vista, you must use vshadow2003andMaybeVista.exe
instead. I personally don't use Vista (and know no self-respecting
sysadmin that does, either) so feel free to experiment and report
any findinds...
|
vshadow will create a vss-setvar.cmd that sets
helpful environment variables relating to our "shadow" volume, and
will then invoke our vss-exec.cmd. Here it is:
C:\Backup>type vss-exec.cmd
call vss-setvar.cmd
@echo off
dosdev B: %SHADOW_DEVICE_1%
echo Removing oldest snapshot...
rmdir /S /Q F:\Backups\15
echo Rolling histories one snapshot ahead...
rename F:\Backups\14 15
rename F:\Backups\13 14
rename F:\Backups\12 13
rename F:\Backups\11 12
rename F:\Backups\10 11
rename F:\Backups\9 10
rename F:\Backups\8 9
rename F:\Backups\7 8
rename F:\Backups\6 7
rename F:\Backups\5 6
rename F:\Backups\4 5
rename F:\Backups\3 4
rename F:\Backups\2 3
rename F:\Backups\1 2
rename F:\Backups\0 1
rsync -rtDvx --chmod=ugo=rwX --delete --link-dest=/cygdrive/f/Backups/1
/cygdrive/b/ /cygdrive/f/Backups/0/
dosdev -r -d B:
|
Don't go blindly executing this, let's see it first, step by step:
- We first call vss-setvar.cmd: vshadow has created this for us, and it will
setup a number of environment variables that relate to the shadow volume and that we can use.
The SHADOW_DEVICE_1 variable points to the low-level device
name of the shadow volume (\\?\Volume{7843917...), so...
- We pass the low-level device name to dosdev, which creates a nice drive letter
for us to use. The B: drive letter is a good choice, since the probability of
it being used is nowadays non-existent.
- Since dosdev has created a drive letter for us, we can proceed with
the usual rsync trickery, that allows us to keep as
many snapshots as we wish: in this example script, 15 snapshots are maintained, one per day,
for around two weeks of backup history:
- We first remove the oldest snapshot. My external USB drive is at F:, so I remove
folder F:\Backups\15.
- We then rotate the snapshots, moving backup folder number 14 to 15, backup folder 13 to 14, etc.
(I know, I know, a loop is in order - but this is just an example).
- We finally invoke rsync:
we instruct it to copy from our shadow volume (drive B:, so in cygwin terms,
/cygdrive/b) to our latest snapshot (/cygdrive/f/Backups/0/), and to utilize hard-links
from /cygdrive/f/Backups/1 (the one-before-last backup) for those files that remain the same.
Since we use --delete, the files that were removed since the last time we backed up,
will also be missing inside F:\Backups\0 (but will be easily accessible on F:\Backups\1).
The --chmod option gives full permissions to all files, and the -rtDvx
will perform the backup in the proper manner: that is, recurse on the whole disc, checking
timestamps/filesizes and actually copying only the files that did change (and thanks to
--link-dest, using hardlinks for the rest).
- After rsync completes, we invoke dosdev to remove the B: label (for the "shadow" drive)
That's it.
The only remaining piece in the puzzle is the automatic invocation of mybackup.cmd
at a convenient time. We can use the Windows Scheduler service for this:
C:\Backup> schtasks /Create /SC weekly /D MON,TUE,WED,THU,FRI /TN MyDailyBackup
/ST 23:30:00 /TR c:\Backup\mybackup.cmd /RU SEMANTIX\ttsiodras /RP mypassword
|
The /RU and /RP options are there to specify the account under which
the backup will take place. Make sure you use an account with backup privileges for this
(the Administrator account will of course work just fine - but it's not a good policy,
security-wise). With the invocation above,
the machine will be automatically backed-up every weekday night at 11:30pm.
If you want to check that this works without waiting for the middle of the night,
do your first backup (which will take more time since it has to copy all the data - the
following backups will be very fast) right now:
C:\Backup> schtasks /Create /SC Once /TN MyFirstBackup
/ST 14:10:00 /TR c:\Backup\mybackup.cmd /RU SEMANTIX\ttsiodras /RP mypassword
(Change 14:10:00 to one/two minutes ahead of your current time)
|
I hope you'll find this process as useful as I have... It is simple to understand
and easy to execute (even for newbies - just change the drive letters to the ones used
in your PC).
P.S. And for those of you that want a taste of things to come: the rsync process
is forced to make a copy of the files that have changed - so if for example
you use VMWARE images (which come with huge .vmdk files), any change inside them
(even one little sector worth of data)
will force a complete copy... and waste a lot of space. Copy-on-write filesystems
like ZFS (and soon,
btrfs) are incredibly more efficient:
If you run the rsync daemon on one of them, you can use rsync with the --inplace option,
and then use the filesystem's snapshotting mechanisms after rsync completes - which will only reserve space
for the storage blocks that actually changed! If you have an OpenSolaris/ZFS server, you can already use this to backup your machines - with such incredible storage gains, that
for all intents and purposes, you can enjoy almost unlimited daily backups.
|
|
|
|