Quantcast
Channel: Gluster Community Website » shanks
Viewing all articles
Browse latest Browse all 10

Disaster recovery of VM snapshot store using GlusterFS Geo Replication.

$
0
0

My setup looks like the following:


Couple of pre-requisites:
1. SELinux booleans are set on both the hypervisors. Geo1 and Geo2 in this example.

  • setsebool -P sanlock_use_fusefs on
  • setsebool -P virt_use_sanlock on
  • setsebool -P virt_use_fusefs on
2. Gluster volumes are create on both master and slave. For simplicity I have use the same volume names. Example of my gluster volumes from one of the node:


[root@ninja ~]# gluster vol info
 
Volume Name: vmstore
Type: Replicate
Volume ID: c96de15d-024e-416d-a1c5-ff5fef44b25b
Status: Created
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: xx.yy.zz.68:/rhs1/vmstore
Brick2: xx.yy.zz.56:/rhs1/vmstore
Options Reconfigured:
storage.owner-gid: 107
storage.owner-uid: 107
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
 
Volume Name: snapstore
Type: Replicate
Volume ID: ad7529d8-242b-40be-a741-aaf331e0fb80
Status: Created
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: xx.yy.zz.68:/rhs2/snapstore
Brick2: xx.yy.zz.56:/rhs2/snapstore
Options Reconfigured:
storage.owner-uid: 107
storage.owner-gid: 107
[root@ninja ~]# 


My typical workflow looks like:


Configure Geo-replication

With the above setup and workflow in mind, I configure Geo-replication environment between master and slave.

[root@ninja ~]# gluster system:: execute gsec_create
Common secret pub file present at /var/lib/glusterd/geo-replication/common_secret.pem.pub

Verify common_secret.pem.pub file is created
[root@ninja ~]# ls -l /var/lib/glusterd/geo-replication/common_secret.pem.pub
-rw------- 1 root root 912 Oct  1 13:10 /var/lib/glusterd/geo-replication/common_secret.pem.pub

Push pem to the slave
[root@ninja ~]# gluster volume geo-replication snapstore gladiator.shanks.com::snapstore create push-pem
Creating geo-replication session between snapstore & gladiator.shanks.com::snapstore has been successful


[root@ninja ~]# gluster volume geo-replication snapstore gladiator.shanks.com::snapstore status
NODE                              MASTER       SLAVE                                          HEALTH         UPTIME       
----------------------------------------------------------------------------------------------------------------------
ninja.shanks.com      snapstore    gladiator.shanks.com::snapstore    Not Started    N/A          
vertigo.shanks.com    snapstore    gladiator.shanks.com::snapstore    Not Started    N/A          
[root@ninja ~]#


Mount vmstore from master on geo1 hypervisor. Note to update /etc/fstab appropriately.

[root@geo1 ~]# mount -t glusterfs ninja.shanks.com:vmstore /var/lib/libvirt/images/
[root@geo1 ~]# df -hT
Filesystem                           Type            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-LogVol01        ext4            3.6T  2.1G  3.4T   1% /
tmpfs                                tmpfs           7.8G     0  7.8G   0% /dev/shm
/dev/sda1                            ext4            485M   40M  421M   9% /boot
ninja.shanks.com:vmstore fuse.glusterfs  195G   33M  195G   1% /var/lib/libvirt/images
[root@geo1 ~]# 


Installation of virtual machine

Before proceeding any further, let's create a qcow2 file and install OS. 

[root@geo1 ~]# qemu-img create -f qcow2 /var/lib/libvirt/images/shanks-rhel1.qcow2 10G
Formatting '/var/lib/libvirt/images/shanks-rhel1.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 

[root@geo1 ~]# qemu-img info /var/lib/libvirt/images/shanks-rhel1.qcow2 
image: /var/lib/libvirt/images/shanks-rhel1.qcow2
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 193K
cluster_size: 65536
[root@geo1 ~]# 


[root@geo1 ~]# virt-install --connect=qemu:///system --network=bridge:br0 --initrd-inject=rhel.ks --extra-args="ks=file:/rhel.ks console=tty0 console=ttyS0,115200" --name=shanks-rhel1 --disk path=/var/lib/libvirt/images/shanks-rhel1.qcow2,device=disk,format=qcow2,bus=virtio,cache=writeback,io=threads --ram 1024 --vcpus=1 --check-cpu --accelerate --hvm --location=http://download.shanks.com/RHEL-6/6.4/Server/x86_64/os/ --nographics


Ensure that the virtual machine is running
[root@geo1 ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 4     shanks-rhel1                   running

[root@geo1 ~]#


Now we mount snapstore volume on geo1
[root@geo1 ~]# mount -t glusterfs ninja.shanks.com:snapstore /var/lib/libvirt/qemu/snapshot
[root@geo1 ~]# df -hT
Filesystem                             Type            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-LogVol01          ext4            3.6T  2.1G  3.4T   1% /
tmpfs                                  tmpfs           7.8G     0  7.8G   0% /dev/shm
/dev/sda1                              ext4            485M   40M  421M   9% /boot
ninja.shanks.com:vmstore   fuse.glusterfs  195G  2.4G  192G   2% /var/lib/libvirt/images
ninja.shanks.com:snapstore fuse.glusterfs  195G   33M  195G   1% /var/lib/libvirt/qemu/snapshot
[root@geo1 ~]# 


Create external snapshot

Time to create an external snapshot
[root@geo1 ~]# virsh snapshot-create-as shanks-rhel1 shanks-rhel1-snap1 --disk-only --atomic 
Domain snapshot shanks-rhel1-snap1 created
[root@geo1 ~]# 

[root@geo1 ~]# qemu-img info /var/lib/libvirt/images/shanks-rhel1.shanks-rhel1-snap1 
image: /var/lib/libvirt/images/shanks-rhel1.shanks-rhel1-snap1
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 960K
cluster_size: 65536
backing file: /var/lib/libvirt/images/shanks-rhel1.qcow2
[root@geo1 ~]# 


Copy this read-only backing file to snapstore
[root@geo1 ~]# cp shanks-rhel1.qcow2 /var/lib/libvirt/qemu/snapshot/shanks-rhel1/

You would also need to copy the xml file for it for the first time. This is not required if you can directly configure/create a xml. I copied to save some time :)

[root@geo1 images]# cp /etc/libvirt/qemu/shanks-rhel1.xml /var/lib/libvirt/qemu/snapshot/shanks-rhel1/

[root@geo1 ~]# ls /var/lib/libvirt/qemu/snapshot/shanks-rhel1/
shanks-rhel1.qcow2  shanks-rhel1-snap1.xml  shanks-rhel1.xml
[root@geo1 ~]# 


Mount the slave volumes on geo2 hypervisor
[root@geo2 ~]# mount -t glusterfs gladiator.shanks.com:vmstore /var/lib/libvirt/images/
[root@geo2 ~]# mount -t glusterfs gladiator.shanks.com:snapstore /var/lib/libvirt/qemu/snapshot/

[root@geo2 ~]# df -hT
Filesystem                                 Type            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-LogVol01              ext4            3.6T  2.1G  3.4T   1% /
tmpfs                                      tmpfs           7.8G     0  7.8G   0% /dev/shm
/dev/sda1                                  ext4            485M   40M  421M   9% /boot
xx.yy.zz.68:/vmstore                       fuse.glusterfs  200G   33M  200G   1% /var/lib/libvirt/images
gladiator.shanks.com:vmstore   fuse.glusterfs  200G   33M  200G   1% /var/lib/libvirt/images
gladiator.shanks.com:snapstore fuse.glusterfs  200G   33M  200G   1% /var/lib/libvirt/qemu/snapshot
[root@geo2 ~]# 


All set to start geo-replication!

From master initiate the geo-replication start command
[root@ninja ~]# gluster volume geo-replication snapstore gladiator.shanks.com::snapstore start
Starting geo-replication session between snapstore & gladiator.shanks.com::snapstore has been successful


status should show as "Initializing..."
[root@ninja ~]# gluster volume geo-replication snapstore gladiator.shanks.com::snapstore status
NODE                              MASTER       SLAVE                                          HEALTH             UPTIME       
--------------------------------------------------------------------------------------------------------------------------
ninja.shanks.com      snapstore    gladiator.shanks.com::snapstore    Initializing...    N/A          
vertigo.shanks.com    snapstore    gladiator.shanks.com::snapstore    Initializing...    N/A          
[root@ninja ~]#

and then later on as "Stable"
[root@ninja ~]# gluster volume geo-replication snapstore gladiator.shanks.com::snapstore status
NODE                              MASTER       SLAVE                                          HEALTH    UPTIME         
-------------------------------------------------------------------------------------------------------------------
ninja.shanks.com      snapstore    gladiator.shanks.com::snapstore    Stable    00:01:12       
vertigo.shanks.com    snapstore    gladiator.shanks.com::snapstore    Stable    00:01:09       
[root@ninja ~]# 


detailed status should show that files are synced
[root@ninja ~]# gluster volume geo-replication snapstore gladiator.shanks.com::snapstore status detail
 
                           MASTER: snapstore  SLAVE: gladiator.shanks.com::snapstore
 
NODE                                HEALTH    UPTIME      FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING   
---------------------------------------------------------------------------------------------------------------------------
ninja.shanks.com        Stable    00:01:28    4              0                0Bytes           0                 
vertigo.shanks.com      Stable    00:01:26    0              0                0Bytes           0                 
[root@ninja ~]# 


You should be able to access the files from geo2 hypervisor snapshot mount
[root@geo2 ~]# ls /var/lib/libvirt/qemu/snapshot/shanks-rhel1/
shanks-rhel1.qcow2  shanks-rhel1-snap1.xml  shanks-rhel1.xml
[root@geo2 ~]# 


Copy the virtual machine xml file to the default location and edit the image source path.

[root@geo2 ~]# cp /var/lib/libvirt/qemu/snapshot/shanks-rhel1/shanks-rhel1.xml /etc/libvirt/qemu/

Copy the backing file to vmstore (/var/lib/libvirt/images) on geo2 hypervisor
[root@geo2 ~]# cp /var/lib/libvirt/qemu/snapshot/shanks-rhel1/shanks-rhel1.qcow2 /var/lib/libvirt/images/
[root@geo2 ~]# 

Define and start the virtual machine
[root@geo2 ~]# virsh define /etc/libvirt/qemu/shanks-rhel1.xml 
[root@geo2 ~]# virsh start shanks-rhel1

Once it is verified that the virtual machines boots up fine, its time to delete the external snapshot. This is not necessary, however, I am deleting it for simplicity.

[root@geo1 ~]# virsh snapshot-list shanks-rhel1
 Name                 Creation Time             State
------------------------------------------------------------
 shanks-rhel1-snap1   2013-10-01 19:01:25 +0530 disk-snapshot
[root@geo1 ~]# 

[root@geo1 ~]# virsh blockpull --domain shanks-rhel1 --path /var/lib/libvirt/images/shanks-rhel1.shanks-rhel1-snap1 --verbose --wait
Block Pull: [ 100 %]
Pull complete

Note that blockpull populates the snap disk with backing file.
[root@geo1 ~]# qemu-img info /var/lib/libvirt/images/shanks-rhel1.shanks-rhel1-snap1 
image: /var/lib/libvirt/images/shanks-rhel1.shanks-rhel1-snap1
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 1.3G
cluster_size: 65536
[root@geo1 ~]# 


[root@geo1 ~]# virsh snapshot-delete shanks-rhel1 shanks-rhel1-snap1 --metadata
Domain snapshot shanks-rhel1-snap1 deleted

[root@geo1 ~]# virsh snapshot-list shanks-rhel1
 Name                 Creation Time             State
------------------------------------------------------------
[root@geo1 ~]#


Let's test this workflow:


1. From geo1 hypervisor login to the virtual machine and touch a file.

[root@geo1 ~]# virsh  console shanks-rhel1
Connected to domain shanks-rhel1
Escape character is ^]

[root@localhost ~]# touch a
[root@localhost ~]# md5sum a
d41d8cd98f00b204e9800998ecf8427e  a
[root@localhost ~]# 

2. Create external snapshot and copy the backing file to snapstore mount (/var/lib/libvirt/qemu/snapshot)

[root@geo1 ~]# virsh snapshot-create-as shanks-rhel1 shanks-rhel1-snap2 --disk-only --atomic 
Domain snapshot shanks-rhel1-snap2 created
[root@geo1 ~]# 

[root@geo1 ~]# qemu-img info /var/lib/libvirt/images/shanks-rhel1.shanks-rhel1-snap2 
image: /var/lib/libvirt/images/shanks-rhel1.shanks-rhel1-snap2
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 8.4M
cluster_size: 65536
backing file: /var/lib/libvirt/images/shanks-rhel1.shanks-rhel1-snap1
[root@geo1 ~]#

[root@geo1 ~]# cp /var/lib/libvirt/images/shanks-rhel1.shanks-rhel1-snap1 /var/lib/libvirt/qemu/snapshot/shanks-rhel1/
[root@geo1 ~]# 


3. Check the geo-replication status in detail and ensure that the image file is fully synced

[root@ninja ~]# gluster volume geo-replication snapstore gladiator.shanks.com::snapstore status detail
 
                           MASTER: snapstore  SLAVE: gladiator.shanks.com::snapstore
 
NODE                                HEALTH    UPTIME      FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING   
---------------------------------------------------------------------------------------------------------------------------
ninja.shanks.com        Stable    00:52:10    5              1                1.3GB            0                 
vertigo.shanks.com      Stable    00:52:07    0              0                0Bytes           0                 
[root@ninja ~]# 

[root@ninja ~]# gluster volume geo-replication snapstore gladiator.shanks.com::snapstore status detail
 
                           MASTER: snapstore  SLAVE: gladiator.shanks.com::snapstore
 
NODE                                HEALTH    UPTIME      FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING   
---------------------------------------------------------------------------------------------------------------------------
ninja.shanks.com        Stable    01:00:27    6              0                0Bytes           0                 
vertigo.shanks.com      Stable    01:00:24    0              0                0Bytes           0                 
[root@ninja ~]# 


4. Replace the existing source image of the virtual machine and start the instance to verify the md5sum of file "a'.

[root@geo2 ~]# cp /var/lib/libvirt/qemu/snapshot/shanks-rhel1/shanks-rhel1.shanks-rhel1-snap1 /var/lib/libvirt/images/shanks-rhel1.qcow2 
cp: overwrite `/var/lib/libvirt/images/shanks-rhel1.qcow2'? y
[root@geo2 ~]# 


[root@geo2 ~]# virsh start shanks-rhel1
Domain shanks-rhel1 started

[root@geo2 ~]# virsh console shanks-rhel1

[root@localhost ~]# md5sum a
d41d8cd98f00b204e9800998ecf8427e  a
[root@localhost ~]# 

Great!

5. Finally, blockpull the snapshot and delete its metadata.

[root@geo1 ~]# virsh blockpull --domain shanks-rhel1 --path /var/lib/libvirt/images/shanks-rhel1.shanks-rhel1-snap2 --verbose --wait
Block Pull: [100 %]
Pull complete
[root@geo1 ~]# 

[root@geo1 ~]# virsh snapshot-delete shanks-rhel1 shanks-rhel1-snap2 --metadataDomain snapshot shanks-rhel1-snap2 deleted
[root@geo1 ~]#


Viewing all articles
Browse latest Browse all 10

Trending Articles