Bug 1552059
| Summary: | [Regression] Cannot delete VM's snapshot | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jan Zmeskal <jzmeskal> | |
| Component: | qemu-kvm-rhev | Assignee: | Fam Zheng <famz> | |
| Status: | CLOSED ERRATA | QA Contact: | Tingting Mao <timao> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.5 | CC: | areis, bugs, chayang, coli, ebenahar, famz, jherrman, juzhang, jzmeskal, kgoldbla, knoel, kwolf, lmiksik, michen, mrezanin, mtessun, ngu, pingl, qizhu, ratamir, salmy, timao, virt-maint, xfu | |
| Target Milestone: | pre-dev-freeze | Keywords: | Regression, ZStream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | qemu-kvm-rhev-2.12.0-1.el7 | Doc Type: | Bug Fix | |
| Doc Text: |
Under certain circumstances, snapshots of guests created in Red Hat Virtualization (RHV) could not be deleted due to an error in the snapshot locking mechanism. This update fixes RHV snapshot locking, and the affected snapshots can now be removed as expected.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1554650 1554946 (view as bug list) | Environment: | ||
| Last Closed: | 2018-11-01 11:06:51 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1554650, 1554946 | |||
Jan, can you please include the vdsm log too? This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. Jan, Can you please share the version of vdsm? In vdsm log that you uploaded I don't see any error, which is a bit odd. How many hosts do you have in the env? I need the SPM logs and the logs of the host running the VM, assuming there are multiple hosts. VDSM info: Name : vdsm Arch : x86_64 Version : 4.19.46 Release : 1.el7ev Size : 2.6 M Repo : installed From repo : rhv-4.1.10 Summary : Virtual Desktop Server Manager URL : http://www.ovirt.org/develop/developer-guide/vdsm/vdsm/ License : GPLv2+ Description : The VDSM service is required by a Virtualization Manager to manage the : Linux hosts. VDSM manages and monitors the host's storage, memory and : networks as well as virtual machine creation, other host administration : tasks, statistics gathering, and log collection. The engine has only two hosts. The vdsm.log I provided is from SPM. I can reproduce once again and provive logs from all both of the hosts. Is vdsm.log enough or do you need some other logs as well? (In reply to Jan Zmeskal from comment #5) > VDSM info: > > Name : vdsm > Arch : x86_64 > Version : 4.19.46 > Release : 1.el7ev > Size : 2.6 M > Repo : installed > From repo : rhv-4.1.10 > Summary : Virtual Desktop Server Manager > URL : http://www.ovirt.org/develop/developer-guide/vdsm/vdsm/ > License : GPLv2+ > Description : The VDSM service is required by a Virtualization Manager to > manage the > : Linux hosts. VDSM manages and monitors the host's storage, > memory and > : networks as well as virtual machine creation, other host > administration > : tasks, statistics gathering, and log collection. > > The engine has only two hosts. The vdsm.log I provided is from SPM. I can > reproduce once again and provive logs from all both of the hosts. Is > vdsm.log enough or do you need some other logs as well? If you can reproduce again and send the engine log and hosts (both) logs, that would be great. Please explain also the storage that you are using - nfs, gluster, iscsi, etc. If nfs or gluster, please provide the paths of the storage. Basically, the scenario you are trying is really basic and should work without errors. I just wonder whether there is something that configured differently. Please provide any info you think might be helpful. Thanks! Thanks Jan. What os are the hosts running - rhel 7.4 or 7.5? Can you please share libvirt version (rpm -qa | grep libvirt)? And qrmu version please: rpm -qa | grep qemu Can you please try the following scenarios and share the results? 1. Create a VM that *not* from a template, add a disk, run the VM, create a snapshot and merge it (live merge) 2. Create a VM *from* a template, add a disk, create a snapshot and merge it (cold merge) 3. Create a VM *not* from a template, add a disk, create a snapshot and merge it (cold merge #2) In all cases no need to install os. Thanks. Thanks Jan. Fam, As reported in comment #14, it seems that when the VM is created from a template, the merge (live and cold) fails. Remembering you asking me about shared images, please note that when a VM is created from a template, there is a shared image in the chain. Please advice how you would like to proceed. The relevant error seems to be: QImgError: cmd=['/usr/bin/taskset', '--cpu-list', '0-3', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'commit', '-p', '-t', 'none', '-b', u'/rhev/data-center/mnt/mantis-nfs-lif2.lab.eng.tlv2.redhat.com:_nas01_ge-cfme-integration-2-nfs-0/f146e846-8dd3-4a0e-993f-cb256642c3cc/images/2ec1fe78-490a-49d3-abb5-308c45df7912/8eb638f6-a286-42a6-8371-4ce554f37622', '-f', 'qcow2', u'/rhev/data-center/mnt/mantis-nfs-lif2.lab.eng.tlv2.redhat.com:_nas01_ge-cfme-integration-2-nfs-0/f146e846-8dd3-4a0e-993f-cb256642c3cc/images/2ec1fe78-490a-49d3-abb5-308c45df7912/bab214b7-7838-43fe-899b-5bf53aebba0d'], ecode=1, stdout=, stderr=qemu-img: Failed to get "write" lock Is another process using the image? , message=None Ala - shouldn't we be using the -U flag? Per BZ 1535992, it is agreed that for RHV 4.1 on RHEL 7.5, -U flag should be provided by default. There supposed to be a downstream patch for this. Please note that per comment #14, if the VM is not based on template, live and cold merge operations successfully work. (In reply to Ala Hino from comment #17) > Per BZ 1535992, it is agreed that for RHV 4.1 on RHEL 7.5, -U flag should be > provided by default. There supposed to be a downstream patch for this. > > Please note that per comment #14, if the VM is not based on template, live > and cold merge operations successfully work. Ala, what's the latest here? The root cause of this bug is having a shared base image that is of qcow2 format. Please use the following steps to reproduce: 1. Create a qcow2 image - base.img $ qemu-img create -f qcow2 base.img 10M 2. Create a child of the base image: $ qemu-img create -f qcow2 -b base.img child1.img 10M 3. Run a qemu process that uses child1 image - at this point, the qemu process takes a write-lock on the entire chain, including the base image. 4. Create another child of the base image: $ qemu-img create -f qcow2 -b base.img child2.img 10M 5. Create a child of child2 image: $ qemu-img create -f qcow2 -b child2.img child2_2.img 10M 6. Commit child2_2 into child2 $ qemu-img commit -p -t none -b child2.img child2_2.img Step #6 fails because qemu-img commit tries to lock the entire chain, including the base image that is already locked by the qemu process created in step #3. This ends with the following error: Error: Command ['/usr/bin/taskset', '--cpu-list', '0-3', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'commit', '-p', '-t', 'none', '-b', u'/rhev/data-center/mnt/rich-nfs-server2.usersys.redhat.com:_home_storage_sd9/a033c56a-d8bd-4313-ab19-4eaabd366c69/images/211bdb3a-befe-4ace-a0bf-e09fa32abb31/f6df22ce-d5c2-4180-b847-afdc24a0f913', '-f', 'qcow2', u'/rhev/data-center/mnt/rich-nfs-server2.usersys.redhat.com:_home_storage_sd9/a033c56a-d8bd-4313-ab19-4eaabd366c69/images/211bdb3a-befe-4ace-a0bf-e09fa32abb31/38c100c2-6db8-44bd-aa00-6d508409b3bd'] failed with rc=1 out='' err=bytearray(b'qemu-img: Failed to get "write" lock\nIs another process using the image?\n') Thank you Ala. Reproduced the issue in comment 20. I believe both "block-commit" and "qemu-img commit" have this problem. The root cause is the file lock we try to acquire on the base image during commit is more restrict than necessary. While I could quickly come up with a small fix as below, I'm still testing it. It would be better if Kevin can help because of the subtlety of the perm system. --- diff --git a/block.c b/block.c index 4f76714f6b..a6b2bf89da 100644 --- a/block.c +++ b/block.c @@ -1003,7 +1003,7 @@ static void bdrv_backing_options(int *child_flags, QDict *child_options, /* backing files always opened read-only */ qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on"); - flags &= ~BDRV_O_COPY_ON_READ; + flags &= ~(BDRV_O_COPY_ON_READ | BDRV_O_RDWR); /* snapshot=on is handled on the top layer */ flags &= ~(BDRV_O_SNAPSHOT | BDRV_O_TEMPORARY); Sent a more complete series to upstream: https://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg02606.html This will be a call for RHV. Kevin, could you help review the upstream patches mentioned in comment 27, please? Hi Jan, Can you please re-test this flow with the proposed build in comment #36? The qemu-kvm-rhev should be qemu-kvm-rhev-2.10.0-21.el7_5.1.x86_64 Thanks Not sure what info is required from me, Kevin? It was a mistake. We are setting to '+' only when it is covered by automation Verified the bug with external snapshot using the same steps of following bz, and it is passed: https://bugzilla.redhat.com/show_bug.cgi?id=1554946#c8 However, for the regression test, there is ONE new bug produced: Total amount of cases: 71 Results: 67 passed; 4 failed Packages Tested: kernel-3.10.0-880.el7.x86_64 qemu-kvm-rhev-2.12.0-1.el7 New Bugs(1): Bug 1575578 - Failed to convert a source image to the qcow2 image encrypted by luks Reproduced Bugs(4) Bug 1523459 - RFE: support to dump metadata of the image encrypted by LUKS using default human format Bug 1331279- test cases of qemu-iotests failed Bug 1527085 - The copied flag should be updated during '-r leaks' Bug 1191402 - The error info is not accurate when create image with specify wrong backing_fmt Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3443 |
Created attachment 1404777 [details] engine.log Description of problem: You can create snapshot of a VM but it cannot be deleted - no matter if the VM is running or not. Error can be found in engine.log. Version-Release number of selected component (if applicable): CAUTION! This bug was found ou: ovirt-engine 4.1.10 However this version was not available in Version field when I was creating this bug. How reproducible: 100 % Steps to Reproduce: 1. Have some running VM (in my log it is "jzmeskal_3") 2. Create a snapshot (in my log it is "jzmeskal_3_snapshot_1"). I did not include memory with the snapshot. 3. Now try to remove the snapshot. It will be unsuccessful. Error message in Tasks: Merging snapshots (Active VM into jzmeskal_3_snapshot_1) of disk golden_mixed_virtio_template on host host_mixed_2 4. Shutdown the VM and try to remove the snapshot again. You will get the same result. Actual results: VM snapshot cannot be deleted. Several errors are found in engine.log. Expected results: You can delete VM's snapshot. Additional info: RHEL 7.5 engine.log in attachment