DRDB on VirtualBox

Published:

When you use the cloud as hardware for your services it does not mean that the cloud should be any better at keeping your hardware working. The cloud can provide high availability measures as hot migration but that is not a guarantee that your systems will have 100% uptime.

I have heard many stories on companies losing a lot of money due to downtime because the server provider is experiencing hardware problems. Some are yelling at the server provider for not keeping their hardware up at all times. Those are the ones that have not managed hardware and has to know that hardware does fail.

Issues needs to be solved by building your own uptime and building your own redundancy. By using techniques such as load balancers and HA tools such as HAProxy you can keep your system up even though some systems fail. Use the Chaos Monkey to test it. I decided to implement my own redundancy using DRBD on two MySQL servers in VirtualBox.

DRBD

I found this guide on Virtualbox and DRBD but it did not work for me. I followed the first steps and created a loop block device with dd.

dd if=/dev/zero of=/opt/drbd-test.loop bs=1M count=200
losetup /dev/loop1 /opt/drbd-test.loop

The resource, is then configured with two clients, 192.168.1.10 and 192.168.1.11. These are configured to use the loop device we created with dd and losetup. The file /etc/drbd.conf is configured as follows.

common { 
  protocol C; 
}
resource test {
  on drbd1 {
    device    /dev/drbd1;
    disk      /dev/loop1;
    address   192.168.1.10:7789;
    meta-disk internal;
  }
  on drbd2 {
    device    /dev/drbd1;
    disk      /dev/loop1;
    address   192.168.1.11:7789;
    meta-disk internal;
  }
 }

Load the kernel module, drbd with modprobe drbd.

I restarted the services and ended up with error code 10. That seemed to be issues with the syncing between the two clients and I had to reinitialize the metadata of the resource which caused me to do the commands below. The first one is initializing the metadata, the second is bringing the resource back up and the third is making the node primary. The last command should only be executed on the primary node (for example drbd1).

drbdadm create-md test
drbdadm up test
drbdsetup /dev/drbd1 primary -o

Then you can check the status of drbd by using cat /proc/drbd. Hopefully it will say something like the following. If it looks like that, drbd is successfully running.

version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@drbd1, 2011-07-10 02:26:35

 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:56267 nr:9 dw:6112 dr:51106 al:8 bm:8 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Then, on the primary I ran the following to create an ext3 filesystem and mount it under /mnt/mysql.

mkfs.ext3 /dev/drbd1
mkdir -p /mnt/mysql && mount /dev/drbd1 /mnt/mysql
touch /mnt/mysql/syncing_drbd
umount /mnt/mysql && drbdadm secondary all

To see that it all works, run the following on the secondary and see the file syncing_drbd in the directory.

drbdadm primary all
mkdir -p /mnt/mysql && mount /dev/drbd1 /mnt/mysql 
ls -l /mnt/mysql

And finally, add drbd to /etc/modules to get the module running at boot.

Heartbeat

To be able to automatically make the secondary primary when the primary goes down we have to use something like heartbeat. Heartbeat sends messages between two servers and if one of them stops responding the other one is taking over.

To configure heartbeat, specify, port and nodes in the file /etc/ha.d/ha.cf

logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
initdead 120
bcast eth1
udpport 694
auto_failback on
node drbd1 drbd2

To create a shared key between the two clients create a shell script with the following content and copy the contents in /etc/ha.d/authkeys from the node where the script was executed to the other.

cat <<-!AUTH >/etc/ha.d/authkeys
    # Automatically generated authkeys file
    auth 1
    1 sha1 `dd if=/dev/urandom count=4 2>/dev/null | md5sum | cut -c1-32`
!AUTH

Heartbeat needs to be configured with the services that it should control when the primary or secondary server hits the floor. This is done in the /etc/ha.d/haresources file. The contents of this file is described very well at the MySQL documentation. The file will mount our drbd disk in /mnt/mysql and start mysql when the active node is lost.

drbd1 drbddisk Filesystem::/dev/drbd1::/mnt/mysql::ext3 mysql

MySQL

To make MySQL use the directory, the easiest way is just to link the configuration file from /mnt/mysql/my.cnf to /etc/mysql/my.cnf so that configurations are changed between all servers. Then in the configuration set the datadir to your drbd directory. This will cause all data to be synced.

I got the error "operation="mknod" pid=12802 parent=3974 profile="/usr/sbin/mysqld" requested_mask="c::" denied_mask="c::" fsuid=106 ouid=106", just remove apparmor or check its configuration. I still got some errors because I moved the directory, then I used:

mysql_install_db --user=mysql --ldata=/mnt/mysql

This is one way to make your service a bit more resistant from failure and keep your uptime a bit higher.