Building a redundant mailstore with DRBD and GFS

I’ve recently been asked to build a redundant mailstore, using two server-class machines that are running Ubuntu. The caveat, however, is that no additional hardware will be purchased, so this rules out using any external filestorage, such as a SAN. I’ve been investigating the use of DRBD in a primary/primary configuration, to mirror a block device between the two servers, and then put GFS2 over the top of it, so that the filesystem can be mounted on both servers at once.

While a set-up like this is more complex and fragile than using ext4 and DRBD in primary/secondary mode and clustering scripts to ensure that the filesystem is only ever mounted on one server at a time, it’s likely that there will be a requirement for GFS on the same two servers for another purpose, in the near future, so it makes sense to use the same method of clustering for both.

The following guide details how to get this going on Ubuntu 10.04 LTS (lucid). It won’t work on any version older than this – the servers that this is destined for were originally running 9.04 (Jaunty), however, I’ve tested DRBD+GFS on that release, and there’s a problem that prevents it from working. As far as I’m concerned, production servers should not be run on non-LTS Ubuntu releases, anyway, because the support lifecycle is far too short. This guide should also work fine for Debian 6.0 (squeeze), although I haven’t tested it, yet.

One thing to keep in mind – the Ubuntu package for gfs2-tools claims that “The GFS2 kernel modules themselves are highly experimental and *MUST NOT* be used in a production environment yet”. There’s a problem with this, however – the gfs2 module is available in the kernel, in Ubuntu 10.04, but the original gfs isn’t there (it wasn’t ever there) and the redhat-cluster-source package which provides it, doesn’t build. I’m inclined to say that the “experimental” warning is incorrect.

Firstly, install DRBD:

apt-get install drbd8-utils drbd8-source

We have to install the drbd8-source package in order to get the drbd kernel module. When drbd is started, it should automatically run dkms to build and install the module.

Now, the servers I’m using have their entire RAID already allocated to an LVM volume group named vg01, so I’m going to create a 60Gb logical volume within this volume group, to be used as the backing store for the DRBD block device on each. Obviously, this step isn’t compulsory and the DRBD block devices, can be put on a plain disk partition instead.

lvcreate -L 60G -n mailmirror vg01

After this, configure /etc/drbd.conf on both servers:

global {
  usage-count yes;
}

common {
  protocol C;
}
resource r0 {
  net {
    allow-two-primaries;
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
  }
  syncer {
    verify-alg sha1;
  }
  startup {
    become-primary-on both;
  }
  on mail01 {
    device    /dev/drbd0;
    disk      /dev/vg01/mailmirror;
    address   10.50.0.11:7789;
    meta-disk internal;
  }
  on mail02 {
    device    /dev/drbd0;
    disk      /dev/vg01/mailmirror;
    address   10.50.0.12:7789;
    meta-disk internal;
  }
}

With this done, we can now set up the DRBD mirror, by running these commands on each server:

drbdadm create-md r0
modprobe drbd
drbdadm attach r0
drbdadm syncer r0
drbdadm connect r0

…and to start the replication between the two block devices, run the following on only one server:

drbdadm -- --overwrite-data-of-peer primary r0

By looking at /proc/drbd, we’ll be able to see the servers syncing. It’s likely that this will take a long time to complete, but the drbd device can still be used, while that’s happening. One last thing we need to do is move it from primary/secondary mode, into primary/primary mode, by running this on the other server:

drbdadm primary r0

So, now we want to create a GFS2 filesystem. There’s a catch here, however: GFS2 cannot sit directly on a DRBD block device. Instead, we need to put an LVM physical volume on the DRBD device, and then create a volume group and logical volume within that. Furthermore, because this is going on a cluster, we need to use clustered LVM and associated clustering software:

apt-get install cman clvm gfs2-tools

And then configure the cluster manager on each server. Put the following in /etc/cluster/cluster.conf:

<?xml version="1.0" ?>
<cluster alias="mailcluster" config_version="6" name="mailcluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <totem consensus="6000" token="3000"/>
        <clusternodes>
                <clusternode name="mail01" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="clusterfence" nodename="mail01"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="mail02" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="clusterfence" nodename="mail02"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="clusterfence"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>

In the above, I’m using manual fencing, because at the moment, I don’t have any other method for fencing available to me. This should not be done in production; it needs a real fencing device, such as an out-of-band management card (eg, Dell DRAC, HP iLO) to kill power to the opposite node, if something is amiss. All that manual fencing does is write messages to syslog, saying that fencing is needed.

Without fencing, it’s possible to encounter a situation where the DRBD device might have stopped mirroring, yet the mail spool is still mounted on each server, with the mail daemon on each one writing to its GFS filesystem independently, and that would be a very difficult mess to clean up.

One other thing: there’s an Ubuntu-specific catch here – Ubuntu’s installer has this irritating habit of putting a host entry in /etc/hosts for the hostname with an IP address of 127.0.1.1. This will break the clustering, so remove the entry from both servers, and either make sure your DNS is set up correctly for the name that you’re using in your cluster interfaces, or add the correct addresses to the hosts file.

You can now start up clustering on both hosts:

/etc/init.d/cman start

Run cman_tool nodes, and if all is well, you’ll see:

Node  Sts   Inc   Joined               Name
   1   M    120   2011-09-14 10:53:32  mail01
   2   M    120   2011-09-14 10:53:32  mail02

We’ll need to make a couple of modifications to /etc/lvm/lvm.conf on both servers. Firstly, to make LVM use its built-in clustered locking:

locking_type = 3

…and secondly, to make it look for LVM signatures on the drbd device (in addition to local disks):

filter = ["a|sd.*|", "a|drbd.*|", "r|.*|"]

Now start up clvm:

/etc/init.d/clvm start

At this point, we can create the LVM physical volume on the drbd device. Because we now have a mirror running between the two servers, we only need to do this on one server:

pvcreate /dev/drbd0

Run pvscan on the other server, and we’ll be able to see that we have a new PV there.

Now, again, on only one server, create the volume group:

vgcreate mailmirror /dev/drbd0

Run vgscan on the other server, to see that the VG also appears there.

Next, we’ll create a logical volume for the GFS filesystem (I’m leaving 10Gb of space spare for a second GFS filesystem in the future):

lvcreate -L 50Gb -n spool mailmirror

And then lvscan on the other server should show the new LV.

The final step is to create the GFS2 filesystem:

mkfs.gfs2 -t mailcluster:mailspool -p lock_dlm -j 2 /dev/mailmirror/spool

mailcluster is the name of the cluster, as defined in /etc/cluster/cluster.conf, while mailspool is a unique name for this filesystem.

We can now to mount this filesystem on both servers, with:

mount -t gfs2 /dev/mailmirror/spool /var/mail

That’s it! We now have have a redundant mailstore. Before starting your mail daemon, however, I’d suggest changing its configuration to use maildir instead of mbox format, because having multiple servers writing to an mbox file is bound to cause corruption at some point.

Other recommended changes would be to alter the servers’ init scripts so that drbd is started before cman and clvm.

Paul Dwerryhouse is a freelance Open Source IT systems and software consultant, based in Australia. Follow him on twitter at http://twitter.com/pdwerryhouse/.

Leave a Reply

Your email address will not be published. Required fields are marked *

Anti-Spam Quiz: