Jack Moore

Email: jack(at)jmoore53.com
Project Updates

Clustered Storage with DRBD

04 Dec 2020 » system configuration, sysadmin, homelab, networking, cluster

Logical Volumes

Create an LVM on each server for the NFS Mount

On both servers:

lvcreate -n nfspoint -V 550G pve/data
mkfs.ext4 /dev/pve/nfspoint
echo '/dev/pve/nfspoint /var/lib/nfspoint ext4 defaults 0 2' >> /etc/fstab
mkdir /var/lib/nfspoint
mount /dev/pve/nfspoint /var/lib/nfspoint/
lvs

vim /etc/exports

/var/lib/nfspoint 10.0.0.0/255.0.0.0(rw,no_root_squash,no_all_squash,sync)

DRBD

DRBD is a little tricky.. Because LVM was thin provisioned it looked like some metadata was written to teh device /dev/pve/nfspoint. This means I had to zero out the nfspoint.

DRBD Configuration:

global { usage-count no; }
common { syncer { rate 100M; } }
resource r0 {
        protocol C;
	    device /dev/drbd0 minor 0;
        startup {
		wfc-timeout 120;
            degr-wfc-timeout 60;
		    become-primary-on both;
        }
        net {
            cram-hmac-alg sha1;
            allow-two-primaries;
            shared-secret "secret";
        }
        on HLPMX1 {
            disk /dev/pve/nfspoint;
            address 10.0.0.2:7788;
            meta-disk internal;
        }
        on HLPMX2 {
            disk /dev/pve/nfspoint;
            address 10.0.0.3:7788;
            meta-disk internal;
        }
}

This wasn’t a problem because there was no data. This may be a problem when I expand the volume with LVM or need to make any kind of resource changes to the device.

dd if=/dev/zero of=/dev/pve/nfspoint bs=1M count=200
mkfs.ext4 –b 4096 /dev/drbd0
curl --output drbd9.15.tar.gz https://launchpad.net/ubuntu/+archive/primary/+sourcefiles/drbd-utils/9.15.0-1/drbd-utils_9.15.0.orig.tar.gz
tar -xf drbd9.15.tar.gz
cd drbd9.15
./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc
make all
make install

Node 2: (because gcc wasn’t installed)

apt install build-essential
apt install gcc
apt install flex
# Copy config to other server 
sudo drbdadm create-md r0
sudo systemctl start drbd.service
sudo drbdadm -- --overwrite-data-of-peer primary all
mkfs.ext4 /dev/drbd0
mkdir /srv/nfspoint
sudo mount /dev/drbd0 /srv/nfspoint

Splitbrain

On Split Brain Victim

drbdadm disconnect r0
drbdadm secondary r0
drbdadm connect --discard-my-data r0

On Split Brain Survivor

drbdadm primary r0
drbdadm connect r0

Inverting Resourcing

On Current Primary:

umount /srv/nfspoint
drbdadm secondary r0

On Secondary:

drbdadm primary r0
mount /dev/drbd0 /srv/nfspoint

DRDB Broken?

Reboot both servers

systemctl start drbd.service
drbdadm status

umount /dev/pve/nfspoint
mkfs.ext4 -b 4096 /dev/pve/nfspoint

dd if=/dev/zero of=/dev/drbd0 status=progress
mkfs -t ext4 /dev/drbd0

Troubleshooting DRBD Issues

Odd Mountpoint with Loop Device

What should appear off lsblk:

NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0  1.7T  0 disk
├─sda1                         8:1    0 1007K  0 part
├─sda2                         8:2    0  512M  0 part
└─sda3                         8:3    0  1.7T  0 part
  ├─pve-swap                 253:0    0    8G  0 lvm  [SWAP]
  ├─pve-root                 253:1    0   96G  0 lvm  /
  ├─pve-data_tmeta           253:2    0 15.6G  0 lvm
  │ └─pve-data-tpool         253:4    0  1.5T  0 lvm
  │   ├─pve-data             253:5    0  1.5T  0 lvm
  │   ├─pve-vm--105--disk--0 253:6    0   20G  0 lvm
  │   └─pve-nfspoint         253:7    0  550G  0 lvm
  │     └─drbd0              147:0    0  550G  0 disk /srv/nfspoint
  └─pve-data_tdata           253:3    0  1.5T  0 lvm
    └─pve-data-tpool         253:4    0  1.5T  0 lvm
      ├─pve-data             253:5    0  1.5T  0 lvm
      ├─pve-vm--105--disk--0 253:6    0   20G  0 lvm
      └─pve-nfspoint         253:7    0  550G  0 lvm
        └─drbd0              147:0    0  550G  0 disk /srv/nfspoint

What was incorrectly appearing off lsblk:

loop0                          7:0    0  200M  0 loop /srv/nfspoint
sda                            8:0    0  1.7T  0 disk
├─sda1                         8:1    0 1007K  0 part
├─sda2                         8:2    0  512M  0 part
└─sda3                         8:3    0  1.7T  0 part
  ├─pve-swap                 253:0    0    8G  0 lvm  [SWAP]
  ├─pve-root                 253:1    0   96G  0 lvm  /
  ├─pve-data_tmeta           253:2    0 15.6G  0 lvm
  │ └─pve-data-tpool         253:4    0  1.5T  0 lvm
  │   ├─pve-data             253:5    0  1.5T  0 lvm
  │   ├─pve-vm--100--disk--0 253:6    0   30G  0 lvm
  │   ├─pve-vm--102--disk--0 253:7    0  100G  0 lvm
  │   ├─pve-vm--103--disk--0 253:8    0   20G  0 lvm
  │   ├─pve-vm--104--disk--0 253:9    0   20G  0 lvm
  │   ├─pve-vm--101--disk--0 253:10   0   20G  0 lvm
  │   └─pve-nfspoint         253:11   0  550G  0 lvm
  │     └─drbd0              147:0    0  550G  0 disk
  └─pve-data_tdata           253:3    0  1.5T  0 lvm
    └─pve-data-tpool         253:4    0  1.5T  0 lvm
      ├─pve-data             253:5    0  1.5T  0 lvm
      ├─pve-vm--100--disk--0 253:6    0   30G  0 lvm
      ├─pve-vm--102--disk--0 253:7    0  100G  0 lvm
      ├─pve-vm--103--disk--0 253:8    0   20G  0 lvm
      ├─pve-vm--104--disk--0 253:9    0   20G  0 lvm
      ├─pve-vm--101--disk--0 253:10   0   20G  0 lvm
      └─pve-nfspoint         253:11   0  550G  0 lvm
        └─drbd0              147:0    0  550G  0 disk

If the server is creating a loop device and mounting that as the mount point, try rebooting the server, restarting the drbd service, and clearing out the device on the server having the issue and resyncing it with the primary.

I still dont know if it was split brain or what, but these were the commands I ran:

umount /nfs/sharepoint # This was the location of the mountpoint
drbdadm disconnect
drbdadm connect --discard-my-data r0

I honestly think rebooting the server in question fixed this issue. I think it was something about the /dev/drbd0 device that wasn’t working properly or created properly.

https://www.howtoforge.com/high_availability_nfs_drbd_heartbeat_p2 https://pve.proxmox.com/wiki/Logical_Volume_Manager_(LVM) https://linux.die.net/man/8/lvremove https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/logical_volume_manager_administration/lv_remove https://serverfault.com/questions/266697/cant-remove-open-logical-volume

Bug Reports

© Jack Moore