Now let's see how can we achieve it with Pacemaker!
At first, I built the drbd package from the Linbit's git repository because it supports primary/primary connection (what we doesn't use now), and it has debian directory to build deb package from: git://git.drbd.org/drbd-8.3.git . After installing the packages, in this example I'll use a bare image file as block device with the help of losetup for mapping device file to this image.
Let's create the image file on every cluster node (c01, c02):
root@c01:~# dd if=/dev/zero of=/drbd_block.img bs=1M count=2048And this way we created a 2G images. Now, let's assign these image files to a device on every cluster node. Since we need to do that at every single boot time, I recommend to put it there:
root@c01:~# dd if=/dev/zero of=/drbd_block.img bs=1M count=2048
/etc/rc.local
losetup /dev/loop0 /drbd_block.imgfor now, you can simply run this command, then you can check the results:
root@c01:~# losetup -aIt is successfully assigned on both nodes. This step is needed since the drbd requires block device to work with, so let's configure the drbd on every nodes:
/dev/loop0: [fe00]:24971 (/drbd_block.img)
root@c02:~# losetup -a
/dev/loop0: [fe00]:24849 (/drbd_block.img)
root@c02:~# cat /etc/drbd.conf |grep -v ^$The explanation: we'll create the r0 assigned to /dev/loop0 which is already assigned to /drbd_block.img file. We'll use Protocol C which requires a full acknowledgement from the peer that they all have all changes written to the device. We tuned to 40M the bandwith, note, that the default is so low, so on recent net we need much more. We'll use crc32c which is probably the fastest hash algorythm available in drbd. And however this configuration is prepared to primary/primary with cluster enabled file system, we won't need it - but it's OK for us for now.
global {
usage-count no;
}
common {
#protocol C;
}
resource r0 {
device /dev/drbd0;
disk /dev/loop0;
meta-disk internal;
protocol C;
on c01 {
address 192.168.0.1:7789;
#flexible-meta-disk internal;
}
on c02 {
address 192.168.0.2:7789;
#meta-disk internal;
}
net {
allow-two-primaries;
#for GFS2 or OCFS2:
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
startup {
become-primary-on both;
}
syncer {
verify-alg crc32c;
rate 40M;
}
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
#pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
#pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
#local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
#outdate-peer "/usr/sbin/drbd-peer-outdater";
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
disk {
# on-io-error detach;
fencing resource-only;
}
}
New let's check wether the drbd kernel module is loaded successfully. If not, please:
root@c01:~# modprobe drbd
root@c01:~# cat /proc/drbd
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root@c01, 2009-09-09 11:13:45
0: cs:Unconfigured
Let's create the device meta-data:
root@c01:~# drbdadm create-md r0Note, I typed yes, because I wanted to overwrite a pre-existing meta data in this resource r0. I recommed to do that on only the wannabe primary node.
You want me to create a v08 style flexible-size internal meta data block.
There apears to be a v08 flexible-size internal meta data block
already in place on /dev/loop0 at byte offset 2147479552
Do you really want to overwrite the existing v08 meta-data?
[need to type 'yes' to confirm] yes
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
Now let's attach the device and set up the net for it:
root@c01:~# drbdadm attach r0or w/o attach and connect, you can simply use drbdadm up r0. Now, you can see, it is Wait For Connection (WFConnection) state. Note, that the data storage is inconsistent and we don't know anything about peer's. Let's set up the peer:
root@c01:~# drbdadm connect r0
root@c01:~# cat /proc/drbd
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root@c01, 2009-09-09 11:13:45
0: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:2097052
root@c02:~# drbdadm up r0Now you can see, they're connected, but inconsistent. So let's force the full syncronization from the primary node (c01):
root@c02:~# cat /proc/drbd
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root@c02, 2009-09-09 11:16:15
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:2097052
root@c01:~# drbdadm -- --overwrite-data-of-peer primary r0You can check the progress bar periodically (on any host). When it's done:
root@c01:~# cat /proc/drbd
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root@c01, 2009-09-09 11:13:45
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
ns:12324 nr:0 dw:0 dr:20688 al:0 bm:0 lo:0 pe:1 ua:2024 ap:0 ep:1 wo:b oos:2084728
[>....................] sync'ed: 0.8% (2084728/2097052)K
finish: 0:02:48 speed: 12,324 (12,324) K/sec
root@c02:~# cat /proc/drbd
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root@c02, 2009-09-09 11:16:15
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
ns:0 nr:2097052 dw:2097052 dr:0 al:0 bm:128 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
You can see here that both local and remote host is UpToDate.
Let's format this newly created drbd device on the primary node (c01):
root@c01:~# mkfs.ext3 /dev/drbd/by-res/r0And now we could mount it by mount -t ext3 /dev/drbd/by-res/r0 /mnt but now we won't.
mke2fs 1.40.8 (13-Mar-2008)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
131072 inodes, 524263 blocks
26213 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=536870912
16 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 33 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
Now we have to make sure that the Pacemaker crm will only handle this resource so remove or start with exit 0 in drbd init script:
root@c01:~# head -2 /etc/init.d/drbdMake sure that we release the r0 on every nodes:
#!/bin/bash
exit 0
root@c01:~# drbdadm down r0
root@c02:~# drbdadm down r0
And see the crm config!
primitive drbd ocf:linbit:drbd \This will instruct the /usr/lib/ocf/resource.d/linbit/drbd script to turn the master/slave aka. primary/secondary roles. I shave set up the monitor intervals differently to see the transitions better.
params drbd_resource="r0" \
op monitor interval="9s" role="Master" timeout="30s" \
op monitor interval="11s" role="Slave" timeout="30s" \
meta target-role="Stopped"
Let's setup a master/slave resource for it:
ms ms-drbd drbd \Where we declare that on every node and in the whole cluster environment there can be only one master and the clone slave aka. secondary can be running on other node only.
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Stopped"
Now, the file system primitive:
primitive fs ocf:heartbeat:Filesystem \Note, we will mount this drbd device with ext3 fs under /var/www.
params device="/dev/drbd/by-res/r0" directory="/var/www" fstype="ext3" \
meta target-role="Stopped"
Make sure that the file system will be mounted on the master (aka. primary drbd) when the drbd device is ready:
order ms-drbd-before-fs inf: ms-drbd:promote fs:startAnd making sure that the filesystem lives where the drbd master is up, so if we migrate one of the to another cluster node, the fs will move with it:
colocation coloc-fs-drbd inf: fs ms-drbd:MasterIn this example, as well, since we're using two nodes, the quorum will fail, so there are two options to avoid this pitfall:
crm(live)configure# property no-quorum-policy="ignore"
or
crm(live)configure# property expected-quorum-votes="1"
Which means that if we have only one node online, it will be decision capable ie. the Domain Co-ordinator.
Now let's start up the resources:
root@c01:~# crmNote the transitions by hitting status command before the drbd resource up&working!
crm(live)# status
============
Last updated: Sat Sep 26 21:48:05 2009
Stack: openais
Current DC: c01 - partition with quorum
Version: 1.0.4-2609e060ce0c516c95ae31f44a10fed0202abfb6
2 Nodes configured, 1 expected votes
4 Resources configured.
============
Online: [ c01 c02 ]
vip (ocf::heartbeat:IPaddr): Started c01
vip2 (ocf::heartbeat:IPaddr2): Started c01
crm(live)# resource start drbd
crm(live)# status
[..]
Master/Slave Set: ms-drbd
Slaves: [ c02 ]
Stopped: [ drbd:0 ]
crm(live)# status
[..]
Master/Slave Set: ms-drbd
Slaves: [ c01 c02 ]
crm(live)# status
[..]
Master/Slave Set: ms-drbd
Masters: [ c01 ]
Slaves: [ c02 ]
Then starting the filesystem ontop the running master drbd:
crm(live)# resource start fsNow let's see how to migrate by hand from c01 to c02 the fs:
crm(live)# status
============
Last updated: Sat Sep 26 21:52:39 2009
Stack: openais
Current DC: c01 - partition with quorum
Version: 1.0.4-2609e060ce0c516c95ae31f44a10fed0202abfb6
2 Nodes configured, 1 expected votes
4 Resources configured.
============
Online: [ c01 c02 ]
Master/Slave Set: ms-drbd
Masters: [ c01 ]
Slaves: [ c02 ]
fs (ocf::heartbeat:Filesystem): Started c01
vip (ocf::heartbeat:IPaddr): Started c01
vip2 (ocf::heartbeat:IPaddr2): Started c01
We're creating a test file on c01:
root@c01:~# ls -la /var/www/Then migrating the fs by:
total 24
drwxr-xr-x 3 root root 4096 2009-09-26 21:24 .
drwxr-xr-x 14 root root 4096 2009-09-25 06:26 ..
drwx------ 2 root root 16384 2009-09-26 21:24 lost+found
root@c01:~# echo "Big bada boom" > /var/www/moo
root@c01:~# ls -la /var/www/moo
-rw-r--r-- 1 root root 14 2009-09-26 21:55 /var/www/moo
crm(live)# resource migrate fs c02Here we go. Checking this on c02:
[a few seconds later:]
crm(live)# status
============
Last updated: Sat Sep 26 21:56:00 2009
Stack: openais
Current DC: c01 - partition with quorum
Version: 1.0.4-2609e060ce0c516c95ae31f44a10fed0202abfb6
2 Nodes configured, 1 expected votes
4 Resources configured.
============
Online: [ c01 c02 ]
Master/Slave Set: ms-drbd
Masters: [ c02 ]
Slaves: [ c01 ]
fs (ocf::heartbeat:Filesystem): Started c02
vip (ocf::heartbeat:IPaddr): Started c01
vip2 (ocf::heartbeat:IPaddr2): Started c01
root@c02:~# ls -la /var/www/Now we can simply test the cluster if we push a power button to turn off the machine (or poweroff in Virtual Machine as I did by VBoxManage controlvm c02 poweroff) and you'll see that the other node will take over the resources if there is no location contraint defined to not to do that.
total 28
drwxr-xr-x 3 root root 4096 2009-09-26 21:55 .
drwxr-xr-x 14 root root 4096 2009-09-25 06:30 ..
drwx------ 2 root root 16384 2009-09-26 21:24 lost+found
-rw-r--r-- 1 root root 14 2009-09-26 21:55 moo
root@c02:~# cat /var/www/moo
Big bada boom
Now the recap of my current testing crm config, where I defined the resource-stickiness to pretty high, to avoid any further resource takeover if the crashed/poweroff'ed node comes up again, because this behaviour can cause more downtime.
root@c02:~# crm configure showI hope you find these short tutorials useful!
node c01 \
attributes standby="off"
node c02 \
attributes standby="off"
primitive drbd ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="9s" role="Master" timeout="30s" \
op monitor interval="11s" role="Slave" timeout="30s" \
meta target-role="Started"
primitive fs ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/r0" directory="/var/www" fstype="ext3" \
meta target-role="Started"
primitive vip ocf:heartbeat:IPaddr \
params ip="10.30.49.254" \
op monitor interval="10s" \
meta target-role="Started"
primitive vip2 ocf:heartbeat:IPaddr2 \
params ip="10.30.49.253" nic="eth0" cidr_netmask="16" \
meta target-role="Started" \
op monitor interval="10s"
ms ms-drbd drbd \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Stopped"
location cli-prefer-fs fs \
rule $id="cli-prefer-rule-fs" inf: #uname eq c02
location cli-prefer-vip vip \
rule $id="cli-prefer-rule-vip" inf: #uname eq c02
location cli-prefer-vip2 vip2 \
rule $id="cli-prefer-rule-vip2" inf: #uname eq c01
colocation coloc-fs-drbd inf: fs ms-drbd:Master
colocation vip-with-vip2 inf: vip vip2
order ms-drbd-before-fs inf: ms-drbd:promote fs:start
property $id="cib-bootstrap-options" \
dc-version="1.0.4-2609e060ce0c516c95ae31f44a10fed0202abfb6" \
cluster-infrastructure="openais" \
expected-quorum-votes="1" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
start-failure-is-fatal="false" \
stonith-action="reboot" \
last-lrm-refresh="1254001497"
rsc_defaults $id="rsc-options" \
resource-stickiness="100000"