Bug 1818655
| Summary: | Failed to do block commit in rhev4.4 after VM migration | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | chhu | ||||||||||||
| Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | ||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | chhu | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | high | ||||||||||||||
| Version: | 8.2 | CC: | dyuan, fjin, jdenemar, jen, jsuchane, lmen, mtessun, pkrempa, virt-maint, xuzhang, yafu, yisun | ||||||||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||||||||
| Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
||||||||||||
| Hardware: | x86_64 | ||||||||||||||
| OS: | Linux | ||||||||||||||
| Whiteboard: | libvirt_RHV_INT | ||||||||||||||
| Fixed In Version: | libvirt-6.0.0-16.el8 | Doc Type: | If docs needed, set a value | ||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2020-05-05 09:59:02 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Bug Depends On: | 1820016 | ||||||||||||||
| Bug Blocks: | |||||||||||||||
| Attachments: |
|
||||||||||||||
Created attachment 1674595 [details]
xml-startvm
Created attachment 1674596 [details]
xml-before-migrate
Created attachment 1674597 [details]
xml-after-migrate
Created attachment 1674598 [details]
xml-deleted-s3
Created attachment 1674599 [details]
backing-chain and libvirtd, vdsm logs
The snapshot index changed due to the fix of Bug 1451398 - [RFE] Add index for the active layer in disk chain And I checked the log, vdsm used correct index numbers: In vdsm log and libvirtd log, the 'top' and 'base' set to index=3 and index=4, this is correct due to comment 0 "xml-after-migrate" Vdsm log 2020-03-29 21:56:53,282-0400 INFO (jsonrpc/5) [virt.vm] (vmId='4dcf9d4e-b65b-4e1a-8852-d44cd229911d') Starting merge with jobUUID='618668b9-8213-447d-a198-f314e5ebc38a', original chain=ab737847-8486-4265-b6f9-fc44f42c1cf5 < e6c30bfc-c9d5-450b-9b62-171d2235e95e < f007375f-4082-4649-8512-f161498bc1f2 (top), disk='sda', base='sda[4]', top='sda[3]', bandwidth=0, flags=8 (vm:5338) 2020-03-29 21:56:53,283-0400 ERROR (jsonrpc/5) [virt.vm] (vmId='4dcf9d4e-b65b-4e1a-8852-d44cd229911d') Live merge failed (job: 618668b9-8213-447d-a198-f314e5ebc38a) (vm:5344) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5342, in merge bandwidth, flags) File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f ret = attr(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 823, in blockCommit if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self) libvirt.libvirtError: Requested operation is not valid: can't keep relative backing relationship Libvirtd log 2020-03-30 01:56:53.283+0000: 881438: debug : virThreadJobSet:94 : Thread 881438 (virNetServerHandleJob) is now running job remoteDispatchDomainBlockCommit 2020-03-30 01:56:53.283+0000: 881438: debug : virDomainBlockCommit:10517 : dom=0x7ffac4007420, (VM: name=lmn4, uuid=4dcf9d4e-b65b-4e1a-8852-d44cd229911d), disk=sda, base=sda[4], top=sda[3], bandwidth=0, flags=0x8 2020-03-30 01:56:53.283+0000: 881438: debug : qemuDomainObjBeginJobInternal:9754 : Starting job: job=modify agentJob=none asyncJob=none (vm=0x7ffac80308e0 name=lmn4, current job=none agentJob=none async=none) 2020-03-30 01:56:53.283+0000: 881438: debug : qemuDomainObjBeginJobInternal:9803 : Started job: modify (async=none vm=0x7ffac80308e0 name=lmn4) 2020-03-30 01:56:53.283+0000: 881438: debug : qemuDomainBlockCommit:18876 : Requested operation is not valid: can't keep relative backing relationship So seems not related to index chagne And reporter helped to confirm that the issue only happened after migration, if snapshots created and deleted on source host, nothing wrong. Guessed maybe something wrong after migration such as issue https://bugzilla.redhat.com/show_bug.cgi?id=1461303 "libvirt does not load the data necessary to keep the relative relationship" on target host? So the problem is that after migration we no longer load the relative paths from the images as the images are specified in the XML now. Fixed upstream by:
commit 2ace7a87a8aced68c2504fd4dd4e2df4302c3eeb
Author: Peter Krempa <pkrempa>
Date: Mon Mar 30 11:18:37 2020 +0200
qemuDomainSnapshotDiskPrepareOne: Don't load the relative path with blockdev
Since we are refreshing the relative paths when doing the blockjobs we
no longer need to load them upfront when doing the snapshot.
Signed-off-by: Peter Krempa <pkrempa>
Reviewed-by: Ján Tomko <jtomko>
commit ffc6249c79dbf980d116af7c7ed20222538a7c1c
Author: Peter Krempa <pkrempa>
Date: Mon Mar 30 11:18:32 2020 +0200
qemu: block: Support VIR_DOMAIN_BLOCK_COMMIT/PULL/REBASE_RELATIVE with blockdev
Preservation of the relative relationship requires us to load the
backing store strings from the disk images. With blockdev we stopped
detecting the backing chain if it's specified in the XML so the relative
links were not loaded at that point. To preserve the functionality from
the pre-blockdev without accessing the backing chain unnecessarily
during VM startup we must refresh the relative links when relative
block commit or block pull is requested.
https://bugzilla.redhat.com/show_bug.cgi?id=1818655
Signed-off-by: Peter Krempa <pkrempa>
Reviewed-by: Ján Tomko <jtomko>
Try to verify on libvirt-6.0.0-16.el8, hit Bug 1820016, blocked by Bug 1820016. Test on packages: libvirt-daemon-kvm-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64 qemu-kvm-4.2.0-17.module+el8.2.0+6129+b14d477b.x86_64 kernel: 4.18.0-193.el8.x86_64 vdsm-4.40.5-1.el8ev.x86_64 Test steps: 1. Start vm on host A, create s1(without memory), s2, s3, migrate vm to host B, delete s3, s1, s2 successfully. 2. For running vm, create s1(without memory), s2, s3(without memory), s4, s5, delete s3, s1, s5, s4, s2 successfully; create s1, s2(without memory), s3, delete s1 successfully, migrate vm from host B to host A, Login to vm, touch file, clone s3, migrate to host A, delete s2, create s4, delete s4, s3 successfully. Set the bug status to VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017 To make sure this could be covered by pure libvirt, reproduced with pure libvirt env as follow:
0.
[root@lenovo-sr630-10 files]# rpm -qa | grep libvirt-6
libvirt-6.0.0-14.module+el8.2.0+6069+78a1cb09.x86_64
1. Prepare a gluster server
# more /etc/glusterfs/glusterd.vol
volume management
type mgmt/glusterd
option working-directory /var/lib/glusterd
option transport-type socket,rdma
option transport.socket.keepalive-time 10
option transport.socket.keepalive-interval 2
option transport.socket.read-fail-log off
option rpc-auth-allow-insecure on
end-volume
# service glusterd restart
Stopping glusterd: [ OK ]
Starting glusterd: [ OK ]
# mkdir /br1
# chmod -R 777 /br1
# setenforce 0
# iptables -F
On gluster serverA:
# gluster peer probe 10.66.82.249
peer probe: success.
# gluster peer status
Number of Peers: 1
Hostname: 10.66.82.249
Uuid: 40f4b505-0765-4a6b-906b-db68c078c1dd
State: Peer in Cluster (Connected)
# gluster volume create gluster-vol1 10.66.85.212:/br1 10.66.82.249:/br1 force
volume create: gluster-vol1: success: please start the volume to access data
如果要创建rdma连接,需要加可以加 gluster volume create gluster-vol1 transport rdma 10.66.85.212:/br1 10.66.82.249:/br1 force
# gluster volume set gluster-vol1 server.allow-insecure on
volume set: success
# gluster volume info
Volume Name: gluster-vol1
Type: Distribute
Volume ID: 2d4e6867-231a-48e7-821a-c4c253241044
Status: Created
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.66.85.212:/br1
Brick2: 10.66.82.249:/br1about:newtab
Options Reconfigured:
server.allow-insecure: on
# gluster volume start gluster-vol1
volume start: gluster-vol1: success
# gluster volume status
Status of volume: gluster-vol1about:newtab
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick 10.66.85.212:/br1 49152 Y 22917
Brick 10.66.82.249:/br1 49152 Y 7408
NFS Server on localhost 2049 Y 22931
NFS Server on 10.66.82.249 2049 Y 7423
Set the nfs.disable=on in the gluster server A:
# gluster volume set gluster-vol1 nfs.disable on
# gluster volume info gluster-vol1 | grep nfs.disable
nfs.disable: on
2. Mount the gluster dir on the 2 test hosts:
# mount -t glusterfs 10.66.85.212:/gluster-vol1 /gmount/
3. Prepare the image chain
root@yisun-test1 /gmount 08:17:56$ qemu-img create -f qcow2 a 10M
Formatting 'a', fmt=qcow2 size=10485760 cluster_size=65536 lazy_refcounts=off refcount_bits=16
root@yisun-test1 /gmount 08:18:04$ qemu-img create -f qcow2 -o backing_fmt=qcow2 -b a b
Formatting 'b', fmt=qcow2 size=10485760 backing_file=a backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
root@yisun-test1 /gmount 08:18:14$ qemu-img create -f qcow2 -o backing_fmt=qcow2 -b b c
Formatting 'c', fmt=qcow2 size=10485760 backing_file=b backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
root@yisun-test1 /gmount 08:18:19$ qemu-img create -f qcow2 -o backing_fmt=qcow2 -b c d
Formatting 'd', fmt=qcow2 size=10485760 backing_file=c backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
4. Use image 'a' as vm's disk
root@yisun-test1 /gmount 08:19:59$ virsh dumpxml ys | awk '/<disk/,/<\/disk/'
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/gmount/a'/>
<backingStore/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
</disk>
root@yisun-test1 /gmount 08:20:01$ virsh start ys
Domain ys started
5. Create 3 external snapshots for vm
root@yisun-test1 /gmount 08:20:11$ virsh snapshot-create-as --reuse-external --disk-only --no-metadata ys --diskspec vda,file=/gmount/b
Domain snapshot 1588767647 created
root@yisun-test1 /gmount 08:20:47$ virsh snapshot-create-as --reuse-external --disk-only --no-metadata ys --diskspec vda,file=/gmount/c
Domain snapshot 1588767671 created
root@yisun-test1 /gmount 08:21:11$ virsh snapshot-create-as --reuse-external --disk-only --no-metadata ys --diskspec vda,file=/gmount/d
Domain snapshot 1588767673 created
6. Now the vm's disk xml on the source host is as follow:
root@yisun-test1 /gmount 08:21:38$ virsh dumpxml ys | awk '/<disk/,/<\/disk/'
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/gmount/d' index='4'/>
<backingStore type='file' index='3'>
<format type='qcow2'/>
<source file='/gmount/c'/>
<backingStore type='file' index='2'>
<format type='qcow2'/>
<source file='/gmount/b'/>
<backingStore type='file' index='1'>
<format type='qcow2'/>
<source file='/gmount/a'/>
<backingStore/>
</backingStore>
</backingStore>
</backingStore>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
</disk>
7. Migrate the vm to target host
root@yisun-test1 /gmount 08:21:43$ virsh migrate ys qemu+ssh://lenovo-sr630-10.lab.eng.pek2.redhat.com/system --live --undefinesource --persistent
root.eng.pek2.redhat.com's password:
8. Now the disk xml on target host is as follow:
[root@lenovo-sr630-10 files]# virsh dumpxml ys | awk '/<disk/,/<\/disk/'
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/gmount/d' index='1'/>
<backingStore type='file' index='2'>
<format type='qcow2'/>
<source file='/gmount/c'/>
<backingStore type='file' index='3'>
<format type='qcow2'/>
<source file='/gmount/b'/>
<backingStore type='file' index='4'>
<format type='qcow2'/>
<source file='/gmount/a'/>
<backingStore/>
</backingStore>
</backingStore>
</backingStore>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
</disk>
9. Do a blockcommit with --keep-relative, error happens
[root@lenovo-sr630-10 files]# virsh blockcommit ys vda --top vda[3] --base vda[4] --verbose --wait --keep-relative
error: Requested operation is not valid: can't keep relative backing relationship
|
Description of problem: In rhev4.4, delete snapshot after VM migration, failed with error: "libvirt.libvirtError: Requested operation is not valid: can't keep relative backing relationship" Version-Release number of selected component (if applicable): libvirt-daemon-kvm-6.0.0-14.module+el8.2.0+6069+78a1cb09.x86_64 qemu-kvm-core-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64 kernel: 4.18.0-190.el8.x86_64(source) 4.18.0-187.el8.x86_64(target) How reproducible: 100% Steps to Reproduce: 1. Start VM on host A with glusterfs disk, the xml is in file: xml-startvm 2. Create snapshots for VM, s1(without memory),s2,s3 the xml is in file: xml-before-migrate -------------------------------------------------------------------- <disk type='file' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/4a3c3fda-6ec6-4e04-9eec-0435d90c49f1' index='5'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='4'> <format type='qcow2'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/f007375f-4082-4649-8512-f161498bc1f2'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='3'> <format type='qcow2'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/e6c30bfc-c9d5-450b-9b62-171d2235e95e'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='1'> <format type='raw'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/ab737847-8486-4265-b6f9-fc44f42c1cf5'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> </backingStore> </backingStore> </backingStore> ------------------------------------------------------------------------ backing chain is in file: backing-chain-before-migrate 3. Migrate VM to host B successfully, but the disk index in xml are changed. the xml is in file: xml-after-migrate -------------------------------------------------------------------- <disk type='file' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/4a3c3fda-6ec6-4e04-9eec-0435d90c49f1' index='1'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='2'> <format type='qcow2'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/f007375f-4082-4649-8512-f161498bc1f2'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='3'> <format type='qcow2'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/e6c30bfc-c9d5-450b-9b62-171d2235e95e'> <seclabel model='dac' relabel='no'/> </source> <backingStore type='file' index='4'> <format type='raw'/> <source file='/rhev/data-center/mnt/glusterSD/*.243:_meili-gv0/68a803e5-fdb5-4c57-a461-d233b205b94a/images/b62c20eb-370c-4a6a-b7a4-d84ef60b1bb9/ab737847-8486-4265-b6f9-fc44f42c1cf5'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> </backingStore> </backingStore> </backingStore> ------------------------------------------------------------ backing chain is in file: backing-chain-after-migrate 5. Delete snapshot s3 successfully the xml after delete s3 is in file: xml-deleted-s3 backing chain is in file: backing-chain-deleted-s3 6. Try to delete s1 failed with error in vdsm.log: -------------------------------------------------------------------- ERROR (jsonrpc/5) [virt.vm] (vmId='4dcf9d4e-b65b-4e1a-8852-d44cd229911d') Live merge failed (job: 618668b9-8213-447d-a198-f314e5ebc38a) (vm:5344) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5342, in merge bandwidth, flags) File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f ret = attr(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 823, in blockCommit if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self) libvirt.libvirtError: Requested operation is not valid: can't keep relative backing relationship ------------------------------------------------------------------- Actual results: In step6, failed to delete snapshot s1 Expected results: In step6, delete snapshot s1 successfully Additional info: - libvirtd and vdsm log