Description of problem: In rhev4.4, delete snapshot after VM migration, failed with error: "libvirt.libvirtError: Requested operation is not valid: can't keep relative backing relationship" Version-Release number of selected component (if applicable): libvirt-daemon-kvm-6.0.0-14.module+el8.2.0+6069+78a1cb09.x86_64 qemu-kvm-core-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64 kernel: 4.18.0-190.el8.x86_64(source) 4.18.0-187.el8.x86_64(target) How reproducible: 100% Steps to Reproduce: 1. Start VM on host A with glusterfs disk, the xml is in file: xml-startvm 2. Create snapshots for VM, s1(without memory),s2,s3 the xml is in file: xml-before-migrate -------------------------------------------------------------------- <disk type='file' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/4a3c3fda-6ec6-4e04-9eec-0435d90c49f1' index='5'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='4'> <format type='qcow2'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/f007375f-4082-4649-8512-f161498bc1f2'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='3'> <format type='qcow2'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/e6c30bfc-c9d5-450b-9b62-171d2235e95e'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='1'> <format type='raw'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/ab737847-8486-4265-b6f9-fc44f42c1cf5'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> </backingStore> </backingStore> </backingStore> ------------------------------------------------------------------------ backing chain is in file: backing-chain-before-migrate 3. Migrate VM to host B successfully, but the disk index in xml are changed. the xml is in file: xml-after-migrate -------------------------------------------------------------------- <disk type='file' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/4a3c3fda-6ec6-4e04-9eec-0435d90c49f1' index='1'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='2'> <format type='qcow2'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/f007375f-4082-4649-8512-f161498bc1f2'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='3'> <format type='qcow2'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/e6c30bfc-c9d5-450b-9b62-171d2235e95e'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='4'> <format type='raw'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/ab737847-8486-4265-b6f9-fc44f42c1cf5'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> </backingStore> </backingStore> </backingStore> ------------------------------------------------------------ backing chain is in file: backing-chain-after-migrate 5. Delete snapshot s3 successfully the xml after delete s3 is in file: xml-deleted-s3 backing chain is in file: backing-chain-deleted-s3 6. Try to delete s1 failed with error in vdsm.log: -------------------------------------------------------------------- ERROR (jsonrpc/5) [virt.vm] (vmId='4dcf9d4e-b65b-4e1a-8852-d44cd229911d') Live merge failed (job: 618668b9-8213-447d-a198-f314e5ebc38a) (vm:5344) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5342, in merge bandwidth, flags) File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f ret = attr(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 823, in blockCommit if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self) libvirt.libvirtError: Requested operation is not valid: can't keep relative backing relationship ------------------------------------------------------------------- Actual results: In step6, failed to delete snapshot s1 Expected results: In step6, delete snapshot s1 successfully Additional info: - libvirtd and vdsm log
Created attachment 1674595 [details] xml-startvm
Created attachment 1674596 [details] xml-before-migrate
Created attachment 1674597 [details] xml-after-migrate
Created attachment 1674598 [details] xml-deleted-s3
Created attachment 1674599 [details] backing-chain and libvirtd, vdsm logs
The snapshot index changed due to the fix of Bug 1451398 - [RFE] Add index for the active layer in disk chain And I checked the log, vdsm used correct index numbers: In vdsm log and libvirtd log, the 'top' and 'base' set to index=3 and index=4, this is correct due to comment 0 "xml-after-migrate" Vdsm log 2020-03-29 21:56:53,282-0400 INFO (jsonrpc/5) [virt.vm] (vmId='4dcf9d4e-b65b-4e1a-8852-d44cd229911d') Starting merge with jobUUID='618668b9-8213-447d-a198-f314e5ebc38a', original chain=ab737847-8486-4265-b6f9-fc44f42c1cf5 < e6c30bfc-c9d5-450b-9b62-171d2235e95e < f007375f-4082-4649-8512-f161498bc1f2 (top), disk='sda', base='sda[4]', top='sda[3]', bandwidth=0, flags=8 (vm:5338) 2020-03-29 21:56:53,283-0400 ERROR (jsonrpc/5) [virt.vm] (vmId='4dcf9d4e-b65b-4e1a-8852-d44cd229911d') Live merge failed (job: 618668b9-8213-447d-a198-f314e5ebc38a) (vm:5344) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5342, in merge bandwidth, flags) File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f ret = attr(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 823, in blockCommit if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self) libvirt.libvirtError: Requested operation is not valid: can't keep relative backing relationship Libvirtd log 2020-03-30 01:56:53.283+0000: 881438: debug : virThreadJobSet:94 : Thread 881438 (virNetServerHandleJob) is now running job remoteDispatchDomainBlockCommit 2020-03-30 01:56:53.283+0000: 881438: debug : virDomainBlockCommit:10517 : dom=0x7ffac4007420, (VM: name=lmn4, uuid=4dcf9d4e-b65b-4e1a-8852-d44cd229911d), disk=sda, base=sda[4], top=sda[3], bandwidth=0, flags=0x8 2020-03-30 01:56:53.283+0000: 881438: debug : qemuDomainObjBeginJobInternal:9754 : Starting job: job=modify agentJob=none asyncJob=none (vm=0x7ffac80308e0 name=lmn4, current job=none agentJob=none async=none) 2020-03-30 01:56:53.283+0000: 881438: debug : qemuDomainObjBeginJobInternal:9803 : Started job: modify (async=none vm=0x7ffac80308e0 name=lmn4) 2020-03-30 01:56:53.283+0000: 881438: debug : qemuDomainBlockCommit:18876 : Requested operation is not valid: can't keep relative backing relationship So seems not related to index chagne And reporter helped to confirm that the issue only happened after migration, if snapshots created and deleted on source host, nothing wrong. Guessed maybe something wrong after migration such as issue https://bugzilla.redhat.com/show_bug.cgi?id=1461303 "libvirt does not load the data necessary to keep the relative relationship" on target host?
So the problem is that after migration we no longer load the relative paths from the images as the images are specified in the XML now.
Fixed upstream by: commit 2ace7a87a8aced68c2504fd4dd4e2df4302c3eeb Author: Peter Krempa <pkrempa> Date: Mon Mar 30 11:18:37 2020 +0200 qemuDomainSnapshotDiskPrepareOne: Don't load the relative path with blockdev Since we are refreshing the relative paths when doing the blockjobs we no longer need to load them upfront when doing the snapshot. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> commit ffc6249c79dbf980d116af7c7ed20222538a7c1c Author: Peter Krempa <pkrempa> Date: Mon Mar 30 11:18:32 2020 +0200 qemu: block: Support VIR_DOMAIN_BLOCK_COMMIT/PULL/REBASE_RELATIVE with blockdev Preservation of the relative relationship requires us to load the backing store strings from the disk images. With blockdev we stopped detecting the backing chain if it's specified in the XML so the relative links were not loaded at that point. To preserve the functionality from the pre-blockdev without accessing the backing chain unnecessarily during VM startup we must refresh the relative links when relative block commit or block pull is requested. https://bugzilla.redhat.com/show_bug.cgi?id=1818655 Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko>
Try to verify on libvirt-6.0.0-16.el8, hit Bug 1820016, blocked by Bug 1820016.
Test on packages: libvirt-daemon-kvm-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64 qemu-kvm-4.2.0-17.module+el8.2.0+6129+b14d477b.x86_64 kernel: 4.18.0-193.el8.x86_64 vdsm-4.40.5-1.el8ev.x86_64 Test steps: 1. Start vm on host A, create s1(without memory), s2, s3, migrate vm to host B, delete s3, s1, s2 successfully. 2. For running vm, create s1(without memory), s2, s3(without memory), s4, s5, delete s3, s1, s5, s4, s2 successfully; create s1, s2(without memory), s3, delete s1 successfully, migrate vm from host B to host A, Login to vm, touch file, clone s3, migrate to host A, delete s2, create s4, delete s4, s3 successfully. Set the bug status to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017
To make sure this could be covered by pure libvirt, reproduced with pure libvirt env as follow: 0. [root@lenovo-sr630-10 files]# rpm -qa | grep libvirt-6 libvirt-6.0.0-14.module+el8.2.0+6069+78a1cb09.x86_64 1. Prepare a gluster server # more /etc/glusterfs/glusterd.vol volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option rpc-auth-allow-insecure on end-volume # service glusterd restart Stopping glusterd: [ OK ] Starting glusterd: [ OK ] # mkdir /br1 # chmod -R 777 /br1 # setenforce 0 # iptables -F On gluster serverA: # gluster peer probe 10.66.82.249 peer probe: success. # gluster peer status Number of Peers: 1 Hostname: 10.66.82.249 Uuid: 40f4b505-0765-4a6b-906b-db68c078c1dd State: Peer in Cluster (Connected) # gluster volume create gluster-vol1 10.66.85.212:/br1 10.66.82.249:/br1 force volume create: gluster-vol1: success: please start the volume to access data 如果要创建rdma连接,需要加可以加 gluster volume create gluster-vol1 transport rdma 10.66.85.212:/br1 10.66.82.249:/br1 force # gluster volume set gluster-vol1 server.allow-insecure on volume set: success # gluster volume info Volume Name: gluster-vol1 Type: Distribute Volume ID: 2d4e6867-231a-48e7-821a-c4c253241044 Status: Created Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.66.85.212:/br1 Brick2: 10.66.82.249:/br1about:newtab Options Reconfigured: server.allow-insecure: on # gluster volume start gluster-vol1 volume start: gluster-vol1: success # gluster volume status Status of volume: gluster-vol1about:newtab Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.66.85.212:/br1 49152 Y 22917 Brick 10.66.82.249:/br1 49152 Y 7408 NFS Server on localhost 2049 Y 22931 NFS Server on 10.66.82.249 2049 Y 7423 Set the nfs.disable=on in the gluster server A: # gluster volume set gluster-vol1 nfs.disable on # gluster volume info gluster-vol1 | grep nfs.disable nfs.disable: on 2. Mount the gluster dir on the 2 test hosts: # mount -t glusterfs 10.66.85.212:/gluster-vol1 /gmount/ 3. Prepare the image chain root@yisun-test1 /gmount 08:17:56$ qemu-img create -f qcow2 a 10M Formatting 'a', fmt=qcow2 size=10485760 cluster_size=65536 lazy_refcounts=off refcount_bits=16 root@yisun-test1 /gmount 08:18:04$ qemu-img create -f qcow2 -o backing_fmt=qcow2 -b a b Formatting 'b', fmt=qcow2 size=10485760 backing_file=a backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 root@yisun-test1 /gmount 08:18:14$ qemu-img create -f qcow2 -o backing_fmt=qcow2 -b b c Formatting 'c', fmt=qcow2 size=10485760 backing_file=b backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 root@yisun-test1 /gmount 08:18:19$ qemu-img create -f qcow2 -o backing_fmt=qcow2 -b c d Formatting 'd', fmt=qcow2 size=10485760 backing_file=c backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 4. Use image 'a' as vm's disk root@yisun-test1 /gmount 08:19:59$ virsh dumpxml ys | awk '/<disk/,/<\/disk/' <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/gmount/a'/> <backingStore/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </disk> root@yisun-test1 /gmount 08:20:01$ virsh start ys Domain ys started 5. Create 3 external snapshots for vm root@yisun-test1 /gmount 08:20:11$ virsh snapshot-create-as --reuse-external --disk-only --no-metadata ys --diskspec vda,file=/gmount/b Domain snapshot 1588767647 created root@yisun-test1 /gmount 08:20:47$ virsh snapshot-create-as --reuse-external --disk-only --no-metadata ys --diskspec vda,file=/gmount/c Domain snapshot 1588767671 created root@yisun-test1 /gmount 08:21:11$ virsh snapshot-create-as --reuse-external --disk-only --no-metadata ys --diskspec vda,file=/gmount/d Domain snapshot 1588767673 created 6. Now the vm's disk xml on the source host is as follow: root@yisun-test1 /gmount 08:21:38$ virsh dumpxml ys | awk '/<disk/,/<\/disk/' <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/gmount/d' index='4'/> <backingStore type='file' index='3'> <format type='qcow2'/> <source file='/gmount/c'/> <backingStore type='file' index='2'> <format type='qcow2'/> <source file='/gmount/b'/> <backingStore type='file' index='1'> <format type='qcow2'/> <source file='/gmount/a'/> <backingStore/> </backingStore> </backingStore> </backingStore> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </disk> 7. Migrate the vm to target host root@yisun-test1 /gmount 08:21:43$ virsh migrate ys qemu+ssh://lenovo-sr630-10.lab.eng.pek2.redhat.com/system --live --undefinesource --persistent root.eng.pek2.redhat.com's password: 8. Now the disk xml on target host is as follow: [root@lenovo-sr630-10 files]# virsh dumpxml ys | awk '/<disk/,/<\/disk/' <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/gmount/d' index='1'/> <backingStore type='file' index='2'> <format type='qcow2'/> <source file='/gmount/c'/> <backingStore type='file' index='3'> <format type='qcow2'/> <source file='/gmount/b'/> <backingStore type='file' index='4'> <format type='qcow2'/> <source file='/gmount/a'/> <backingStore/> </backingStore> </backingStore> </backingStore> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </disk> 9. Do a blockcommit with --keep-relative, error happens [root@lenovo-sr630-10 files]# virsh blockcommit ys vda --top vda[3] --base vda[4] --verbose --wait --keep-relative error: Requested operation is not valid: can't keep relative backing relationship