(cache) DRBD_Cookbook - Cluster Wiki

This is a mini cookbook which is designed to help users get DRBD up and running in a 2-node cluster with GFS on top. Maintained by LonHohberger

So, you want to try playing with GFS, but you do not have a SAN. GFS, as you know, is not a distributed file system. Rather, it is a shared disk cluster file system. This means that in order to use it on two or more computers, you must have a disk shared between them.... or, do you?

DRBD is a RAID-1 style block device which synchronizes over the network between two computers. In essence, it provides a virtual shared storage between two computers. As of 0.8, DRBD can be used with GFS (and other shared disk cluster file systems) due to the addition of concurrent writer support.

Ok, so, you want GFS-on-DRBD. Here's how, sort of...

Before you start

This was written for RHEL5/CentOS5. Input from other distribution users is appreciated.
You will need gfs-utils or gfs2-utils. On RHEL5, get both (since gfs-utils requires the latter).
You may *not* use DRBD as a quorum disk in a 2-node cluster.
DRBD in active/active mode works only in two node clusters (at least, the free version...)
As of this writing, Red Hat does not ship nor support DRBD.
Performance is unlikely to be very good in this configuration.
Do not try this on shared storage. DRBD is for use on storage which is not shared.

Basic CMAN Configuration

For DRBD to work in its simplest form on Linux-Cluster, you will need a valid, two node cluster configuration. This includes using expected_votes="1" and two_node="1" in the <cman> tag of cluster.conf and, more importantly, fencing. Even though DRBD may not require fencing in all circumstances, GFS does. Here is an example cluster configuration (/etc/cluster/cluster.conf) for a 2-node virtual cluster using XVM fencing:

<?xml version="1.0"?>
<cluster alias="lolcats" config_version="41" name="lolcats">
        <cman expected_votes="1" two_node="1"/>
        <clusternodes>
                <clusternode name="frederick" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="frederick" name="xvm"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="molly" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="molly" name="xvm"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice agent="fence_xvm" name="xvm"/>
        </fencedevices>
        <rm/>
</cluster>

How to configure linux-cluster / CMAN is beyond the scope of this document.

DRBD Configuration

Fabio C. gave me his configuration (/etc/drbd.conf) file, which I tweaked for my Xen cluster. You will need to adapt the below configuration to your environment.

global { usage-count yes; }
common { syncer { rate 100M; } }
resource the-disk {
        protocol C;
        startup {
                wfc-timeout 20;
                degr-wfc-timeout 10;
                # become-primary-on both; # Enable this *after* initial testing
        }
        net {
                cram-hmac-alg sha1;
                shared-secret "happy2008everybody";
                allow-two-primaries;
        }
        on molly {
                device    /dev/drbd1;
                disk      /dev/xvdd;
                address   10.12.32.98:7789;
                meta-disk  internal;
        }
        on frederick {
                device    /dev/drbd1;
                disk      /dev/xvdd;
                address   10.12.32.99:7789;
                meta-disk  internal;
        }
        disk {
                fencing resource-and-stonith;
        }
        handlers {
                outdate-peer "/sbin/obliterate"; # We'll get back to this.
        }
}

The obliterate script is available here, and was last updated on 6-Dec-2007. This script calls forth CMAN's fencing to smite the other node in a 2-node cluster. Once the obliterate script terminates (successfully, of course), DRBD will recover. There are some things to be aware of with the initial implementation:

obliterate only tries once, instead of "until success" like CMAN
The dead node will get fenced twice. Noted in the script is a more correct method of killing the node. Mostly, I just wanted to get a PoC out there. Fixing it is not difficult.

Ok, back to the configuring! Create the metadata on both nodes and make DRBD think it is consistent:

[root@molly ~]# drbdadm create-md the-disk
v08 Magic number not found
v07 Magic number not found
About to create a new drbd meta data block
on /dev/xvdd.

==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] yes

Creating meta data...
initialising activity log
NOT initialized bitmap (32 KB)
New drbd meta data block sucessfully created.
success
[root@molly ~]# drbdadm -- 6::::1 set-gi the-disk
previously 0000000000000004:0000000000000000:0000000000000000:0000000000000000:0:0:0:0:0:0
set GI to  0000000000000006:0000000000000000:0000000000000000:0000000000000000:1:0:0:0:0:0

Write new GI to disk?
[need to type 'yes' to confirm] yes

Start up DRBD on both nodes (as close to the same time as you can):

[root@molly ~]# service drbd start
Starting DRBD resources:    [ d0 s0 n0 ].

Check the state on both nodes using the following commands. It should say "Secondary/Secondary".

[root@molly ~]# drbdadm state all
Secondary/Secondary

Promote both nodes to primary mode.

[root@molly ~]# drbdadm primary all

Check the state again - this time, it should say "Primary/Primary".

[root@molly ~]# drbdadm state all
Primary/Primary

One thing that is important to note: when using GFS with DRBD, if you want GFS file systems to be mounted on system startup, you must make the drbd init script's 'start' operation to occur between when the cman init script is called and when the gfs init script is called. Edit /etc/init.d/drbd and change the chkconfig line (#3 in 0.8.2.1) to:

chkconfig: 345 22 75

You may now enable DRBD on startup by running the following command on both nodes:

chkconfig --level 345 drbd on

Important: In order for DRBD to automatically work in active/active mode, you must uncomment the become-primary-on line in /etc/drbd.conf on both nodes. Failure to do this will cause each cluster node to start in the 'Secondary' state - blocking access to GFS volumes.

Making a GFS volume

Now comes the fun part! First of all, we need to make a mount point on both nodes:

mkdir /mnt/drbdtest

Next, we need to create a file system. The format of the gfs_mkfs command is a little more complicated than traditional file systems (like ext3). We need to know the locking protocol (usually 'lock_dlm'), a lock table name, and the number of journals. With DRBD 0.8, the number of journals will always be '2' - because it only allows two concurrent writers at a time. The only complicated one is the lock table, which takes the form of <clustername>:<file_system_name>. My cluster is named 'lolcats' and I decided to call my file system 'drbdtest'. Here is what the output of gfs_mkfs looks like:

[root@molly ~]# gfs_mkfs -p lock_dlm -t lolcats:drbdtest /dev/drbd1 -j 2
This will destroy any data on /dev/drbd1.
Are you sure you want to proceed? [y/n] y
Device:                    /dev/drbd1
Blocksize:                 4096
Filesystem Size:           190420
Journals:                  2
Resource Groups:           8
Locking Protocol:          lock_dlm
Lock Table:                lolcats:drbdtest
Syncing... All Done

Once this is done, you can mount it on both nodes:

mount -t gfs /dev/drbd1 /mnt/drbdtest

You should be able to see it in the output of the mount command at this point. More importantly, however, is what you see when you see what CMAN has to say about it:

[root@molly ~]# cman_tool services
type             level name      id       state        
fence            0     default   00010001 none         
[1 2]
dlm              1     drbdtest  00040002 none         
[1 2]
gfs              2     drbdtest  00030002 none         
[1 2]

If you've gotten this far, you can now do crazy things like create files in /mnt/drbdtest and see their contents from the other node!

Making a GFS2 volume

You may also run GFS2 on top of DRBD. This is quite similar to creating a GFS volume, as noted above. The primary difference is the creation program:

[root@molly ~]# mkfs.gfs2 -p lock_dlm -t lolcats:drbdtest /dev/drbd1 -j 2
This will destroy any data on /dev/drbd1.
Are you sure you want to proceed? [y/n] y
Device:                    /dev/drbd1
Blocksize:                 4096
Device Size                1.00 GB (262127 blocks)
Filesystem Size:           1.00 GB (262125 blocks)
Journals:                  2
Resource Groups:           4
Locking Protocol:          "lock_dlm"
Lock Table:                "lolcats:drbdtest"

Once this is done, you can mount it on both nodes:

mount -t gfs2 /dev/drbd1 /mnt/drbdtest

That's it! Try the same status commands as you did with GFS.

Resources

http://www.drbd.org/fileadmin/drbd/doc/8.0.2/en/drbd.conf.html - DRBD config file documentation

http://osdir.com/ml/linux.kernel.drbd.devel/2006-11/msg00005.html - Documentation for return codes of the DRBD fencing callout

http://drbd-plus.linbit.com/examples:skip-initial-sync - Where I got the tweak to skip the initial sync done.

Thanks to fabioc on freenode for his efforts.

CategoryHowTo

None: DRBD_Cookbook (最終更新日時 2008-03-18 20:58:03 更新者 LonHohberger)