Description of problem: When a volume snapshot which is managed through libvirt in OpenStack is deleted, the file backend is committed with libvirt. However, an internal error of libvirt occurs with unmatching the value format like below. ~~~ 2020-06-30 01:10:55.953 7 DEBUG nova.virt.libvirt.driver will call blockCommit with commit_disk=vdb commit_base=volume-<Vol-UUID> commit_top=volume-<Vol-UUID>.<snapshot-UUID> _volume_snapshot_delete /usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:3010 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver Error occurred during volume_snapshot_delete, sending error status to Cinder.: libvirt.libvirtError: internal error: qemu block name 'json:{"backing": {"driver": "raw", "file": {"driver": "file", "filename": "/var/lib/nova/mnt/<UUID>/volume-<Vol-UUID>"}}, "driver": "qcow2", "file": {"driver": "file", "filename": "/var/lib/nova/mnt/<UUID>/volume-<Vol-UUID>.<snapshot-UUID>"}}' doesn't match expected '/var/lib/nova/mnt/<UUID>/volume-<Vol-UUID>.<snapshot-UUID>' 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] Traceback (most recent call last): 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3028, in volume_snapshot_delete 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] snapshot_id, delete_info=delete_info) 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3013, in _volume_snapshot_delete 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] result = dev.commit(commit_base, commit_top, relative=True) 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 805, in commit 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] self._disk, base, top, self.COMMIT_DEFAULT_BANDWIDTH, flags=flags) 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] result = proxy_call(self._autowrap, f, *args, **kwargs) 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] rv = execute(f, *args, **kwargs) 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] six.reraise(c, e, tb) 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] raise value 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] rv = meth(*args, **kwargs) 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] File "/usr/lib64/python3.6/site-packages/libvirt.py", line 728, in blockCommit 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self) 2020-06-30 01:10:55.979 7 ERROR nova.virt.libvirt.driver [instance: <UUID>] libvirt.libvirtError: internal error: qemu block name 'json:{"backing": {"driver": "raw", "file": {"driver": "file", "filename": "/var/lib/nova/mnt/<UUID>/volume-<Vol-UUID>"}}, "driver": "qcow2", "file": {"driver": "file", "filename": "/var/lib/nova/mnt/<UUID>/volume-<Vol-UUID>.<snapshot-UUID>"}}' doesn't match expected '/var/lib/nova/mnt/<UUID>/volume-<Vol-UUID>.<snapshot-UUID>' ~~~ Version-Release number of selected component (if applicable): * libvirt-daemon-5.6.0-10 How reproducible: 100% Steps to Reproduce: 1. Setup OpenStack with NFS backend for instances 2. Run OpenStack Tempest test - tempest.api.test_volumes_snapshots.VolumesSnapshotTestJSON.test_snapshot_create_delete_with_volume_in_use -> https://opendev.org/openstack/tempest/src/tag/23.0.0/tempest/api/volume/test_volumes_snapshots.py#L40-L63 Actual results: An error occurs. ~~~ libvirt.libvirtError: internal error: qemu block name 'json:{"backing": {"driver": "raw", "file": {"driver": "file", "filename": "/var/lib/nova/mnt/<UUID>/volume-<Vol-UUID>"}}, "driver": "qcow2", "file": {"driver": "file", "filename": "/var/lib/nova/mnt/<UUID>/volume-<Vol-UUID>.<snapshot-UUID>"}}' doesn't match expected '/var/lib/nova/mnt/<UUID>/volume-<Vol-UUID>.<snapshot-UUID>' ~~~ Expected results: No error occurs. Additional info: * Red Hat OpenStack 16.0 (RHEL8.1) - NFS backend for instances * qemu-kvm-core-4.1.0-23
I think this issue is similar to BZ#1785939 because the error message and libvirt line 728 on the bug is the same of this. ~~~ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5542, in merge bandwidth, flags) File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f ret = attr(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 728, in blockCommit if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self) libvirt.libvirtError: internal error: qemu block name 'json:{"backing": {"backing": {"driver": "qcow2", "file": {"driver": "file", "filename": "/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge7__nfs__0/98c55ff6-4857-4b5e-8e56-31cb3707d5d8/images/32c49b05-90f2-4289-b15e-521e1c81fa0f/709ce457-defb-479e-ba50-43a73dfecab6"}}, "driver": "qcow2", "file": {"driver": "file", "filename": "/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge7__nfs__0/98c55ff6-4857-4b5e-8e56-31cb3707d5d8/images/32c49b05-90f2-4289-b15e-521e1c81fa0f/19fd5cc1-3128-45f5-903e-3f6a9859b98d"}}, "driver": "qcow2", "file": {"driver": "file", "filename": "/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge7__nfs__0/98c55ff6-4857-4b5e-8e56-31cb3707d5d8/images/32c49b05-90f2-4289-b15e-521e1c81fa0f/fd40794a-dbd8-43d0-9357-9c68bb586870"}}' doesn't match expected '/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge7__nfs__0/98c55ff6-4857-4b ~~~ So, I guess it's already fixed in version 8.2 according to the comment BZ#1785939#c33. Is my understanding correct?
Yes, this should be fixed by libvirt switching to -blockdev starting from rhel-av-8.2. Should we move this bug to openstack for testing it? Otherwise I'll close it as a duplicate of the bug enabling -blockdev in libvirt.
Hi Peter, Thank you for the confirmation. May I ask some questions to understand the fix correctly ? (1) According to your update in the previous bz[1], the issue should be fixed in qemu-4.2 and libvirt-5.10. Is it correct ? https://bugzilla.redhat.com/show_bug.cgi?id=1785939#c7 (2) Definitely we can move this bug (or create a cloned one and assign it to RHOSP) as test-only patch. However the problem is that RHOSP16.0, which should be supported until October 27, 2020, is based on RHEL8.1, which doesn't have fix yet. https://access.redhat.com/support/policy/updates/openstack/platform Should we make z-stream request in the previous bz ? Or can we treat this as a fix request for RHEL7.1 ? Honestly I'm concerned that we can't backport the same fix into RHEL7.1 virt because the fix sounds like a huge change in architecture.
(In reply to Takashi Kajinami from comment #3) > Hi Peter, > > Thank you for the confirmation. > May I ask some questions to understand the fix correctly ? > > (1) > According to your update in the previous bz[1], the issue should be fixed in > qemu-4.2 and libvirt-5.10. Is it correct ? Yes, that is correct. rhel-av-8.2 uses libvirt-6.0 > > https://bugzilla.redhat.com/show_bug.cgi?id=1785939#c7 > > (2) > Definitely we can move this bug (or create a cloned one and assign it to > RHOSP) > as test-only patch. > > However the problem is that RHOSP16.0, which should be supported until > October 27, 2020, > is based on RHEL8.1, which doesn't have fix yet. > https://access.redhat.com/support/policy/updates/openstack/platform > Should we make z-stream request in the previous bz ? > Or can we treat this as a fix request for RHEL7.1 ? > Honestly I'm concerned that we can't backport the same fix into RHEL7.1 virt > because the fix sounds like a huge change in architecture. No, we can't without a rebase. The changeset that switched over to -blockdev is plainly too huge. Without that we can't really control the blockjobs any better.
I think that you meant RHEL-8.1 in the paragraph above. Either way. I can only suggest upgrading to rhel-av-8.2. Both libvirt and qemu. There shouldn't be any semantic changes requiring changes in openstack to be able to use new libvirt and qemu. I'm not sure how openstack handles such things though. Either way, it's not possible to backport this to rhel-av-8.1. The 'backport' would basically be a rebase to rhel-av-8.2 anyways.
> I think that you meant RHEL-8.1 in the paragraph above. Yes. Sorry for making confusion. The problem is that, as I mentioned, RHOSP16.0 only supports RHEL8.1, and RHEL8.2 is supported since upcoming RHOSP16.1 . This means that this snapshot feature is completely broken in RHOSP16.0 because of this bug in libvirt, though this 16.1 will be supported until October, with using RHEL 8.1 EUS as its base OS. So the ideal solution here would be implement fix in rhel-av-8.1 (which is not a backport but a kind of old version only fix) but I don't know this is actually acceptable from RHEL/Virt team. For you reference (and my personal note), the following shows the list of qemu/libvirt packages installed in the lastest libvirt container image we use in RHOSP16.0 https://catalog.redhat.com/software/containers/rhosp-rhel8/openstack-nova-libvirt/5de6c2ddbed8bd164a0c1bbf?tag=16.0-108&architecture=amd64&container-tabs=overview libvirt-bash-completion-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-client-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-config-nwfilter-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-interface-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-network-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-nodedev-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-nwfilter-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-qemu-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-secret-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-storage-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-storage-core-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-storage-disk-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-storage-gluster-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-storage-iscsi-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-storage-iscsi-direct-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-storage-logical-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-storage-mpath-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-storage-rbd-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-driver-storage-scsi-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-daemon-kvm-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 libvirt-libs-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64 qemu-img-4.1.0-23.module+el8.1.1+6238+f5d69f68.3.x86_64 qemu-kvm-4.1.0-23.module+el8.1.1+6238+f5d69f68.3.x86_64 qemu-kvm-block-curl-4.1.0-23.module+el8.1.1+6238+f5d69f68.3.x86_64 qemu-kvm-block-gluster-4.1.0-23.module+el8.1.1+6238+f5d69f68.3.x86_64 qemu-kvm-block-iscsi-4.1.0-23.module+el8.1.1+6238+f5d69f68.3.x86_64 qemu-kvm-block-rbd-4.1.0-23.module+el8.1.1+6238+f5d69f68.3.x86_64 qemu-kvm-block-ssh-4.1.0-23.module+el8.1.1+6238+f5d69f68.3.x86_64 qemu-kvm-common-4.1.0-23.module+el8.1.1+6238+f5d69f68.3.x86_64 qemu-kvm-core-4.1.0-23.module+el8.1.1+6238+f5d69f68.3.x86_64
(In reply to Takashi Kajinami from comment #6) > > I think that you meant RHEL-8.1 in the paragraph above. > Yes. Sorry for making confusion. > > The problem is that, as I mentioned, RHOSP16.0 only supports RHEL8.1, > and RHEL8.2 is supported since upcoming RHOSP16.1 . > > This means that this snapshot feature is completely broken in RHOSP16.0 > because of this bug in libvirt, > though this 16.1 will be supported until October, with using RHEL 8.1 EUS as > its base OS. > > So the ideal solution here would be implement fix in rhel-av-8.1 (which is > not a backport but a kind of > old version only fix) but I don't know this is actually acceptable from > RHEL/Virt team. So this would be a downstream-only throwaway fix for just rhel-av-8.1, which I don't think will even have another Z-stream release. This really needs to go through PMs first then to approve spending time for a throwaway fix. To be clear, I don't really know the extent of what's required changing. The problem is in the cooperation between qemu and libvirt here actually. Qemu for some reason converts the filenames to JSON strings in this instance. This then throws off libvirt as the old code wans't ready for that. Proper fixing will require also figuring out when and why that happens in qemu so that we can be sure. As said, any of that code is no longer invoked in av-8.2 and upstream. My suggestion is still just to use qemu/libvirt which uses blockdev. The idea of libvirt is in fact to shield from any differences of the underlying hypervisor and provide a stable compatibility layer. It should be utilized as such. Please raise the issue thorugh appropriate management if the fix really must be based on av-8.1 as I'd consider it a waste of time and effort.