Created attachment 1694382 [details] dumxml_after_taking_snapshots Description of problem: In the case of two snapshots with memory, both snapshots can't be deleted after `preview & commit` the 2nd snapshot Version-Release number of selected component (if applicable): - ovirt-engine-4.4.1.1-0.5.el8ev.noarch - vdsm-4.40.18-1.el8ev.x86_64 - libvirt-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create and start a VM (RHEL8.2 in my case) 2. Take two snapshots with memory 3. Shutdown the VM 4. Preview and commit the 2nd snapshot 5. Start the VM 6. Try deleting one of the snapshots Actual results: Snapshot deletion fails (see engine.log and vdsm.log below) Expected results: Both snapshots deleted successfully Additional info: The attached dumpxml files show the volume changes - engine.log 2020-06-02 13:27:57,121+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-5) [cc76129a-5e95-431b-937a-39d01251cb84] Command 'MergeVDSCommand(HostName = host_mixed_3, MergeVDSCommandParameters:{hostId='f4cc8c8c-227c-472a-b4c3-54c03511295f', vmId='a153906f-23c1-4e90-9a3f-41ed88e95bb9', storagePoolId='932aba2a-17fe-4cfc-9d7a-1c509510550f', storageDomainId='d2b2f071-6358-4092-bd1f-38fde1d0deed', imageGroupId='b26aa91f-edf9-433c-83b8-edd529853511', imageId='ed20fc47-63e6-4442-9cf9-96889754d8f1', baseImageId='0e33b218-7d41-46e4-9a13-e632e8aad5bf', topImageId='0f41d667-6b7c-43bf-862c-fe2dcbff0b47', bandwidth='0'})' execution failed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52 2020-06-02 13:27:57,121+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-5) [cc76129a-5e95-431b-937a-39d01251cb84] FINISH, MergeVDSCommand, return: , log id: 63b4b094 2020-06-02 13:27:57,121+03 ERROR [org.ovirt.engine.core.bll.MergeCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-5) [cc76129a-5e95-431b-937a-39d01251cb84] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52 (Failed with error mergeErr and code 52) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:114) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) - vdsm.log 2020-06-02 13:27:57,103+0300 INFO (jsonrpc/5) [api.virt] START merge(drive={'poolID': '932aba2a-17fe-4cfc-9d7a-1c509510550f', 'volumeID': 'ed20fc47-63e6-4442-9cf9-96889754d8f1', 'imageID': 'b26aa91f-edf9-433c-83b8-edd529853511', 'domainID': 'd2b2f071-6358-4092-bd1f-38fde1d0deed'}, baseVolUUID='0e33b218-7d41-46e4-9a13-e632e8aad5bf', topVolUUID='0f41d667-6b7c-43bf-862c-fe2dcbff0b47', bandwidth='0', jobUUID='6e86f949-58e3-43fc-90bd-5d2a532a6352') from=::ffff:10.46.12.114,59914, flow_id=cc76129a-5e95-431b-937a-39d01251cb84, vmId=a153906f-23c1-4e90-9a3f-41ed88e95bb9 (api:48) 2020-06-02 13:27:57,117+0300 ERROR (jsonrpc/5) [virt.vm] (vmId='a153906f-23c1-4e90-9a3f-41ed88e95bb9') merge: Cannot find volume 0f41d667-6b7c-43bf-862c-fe2dcbff0b47 in drive vda's volume chain (vm:5310) 2020-06-02 13:27:57,117+0300 INFO (jsonrpc/5) [api.virt] FINISH merge return={'status': {'code': 52, 'message': 'Merge failed'}} from=::ffff:10.46.12.114,59914, flow_id=cc76129a-5e95-431b-937a-39d01251cb84, vmId=a153906f-23c1-4e90-9a3f-41ed88e95bb9 (api:54) 2020-06-02 13:27:57,117+0300 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call VM.merge failed (error 52) in 0.01 seconds (__init__:312) 2020-06-02 13:27:57,478+0300 INFO (jsonrpc/7) [api.host] START getStats() from=::ffff:10.46.12.114,59914 (api:48)
Created attachment 1694383 [details] dumxml_after_starting_the_vm
Liran, can you please check if this is related to your latest changes? This used to work until not long ago
I don't see how it relates to my changes which relate to snapshot creation. The domxml doesn't fit the snippet given of the engine/vdsm log but in the xml one volume has changed: after snapshot xml: <ovirt-vm:volumeChainNode> <ovirt-vm:domainID>d2b2f071-6358-4092-bd1f-38fde1d0deed</ovirt-vm:domainID> <ovirt-vm:imageID>75bb1afd-0e0f-4dfe-9829-43fd9fec8719</ovirt-vm:imageID> <ovirt-vm:leaseOffset type="int">0</ovirt-vm:leaseOffset> <ovirt-vm:leasePath>/rhev/data-center/mnt/beetle-nfs-lif2.lab.eng.tlv2.redhat.com:_rhv_compute_compute-he-7_nfs__0/d2b2f071-6358-4092-bd1f-38fde1d0deed/images/75bb1afd-0e0f-4dfe-9829-43fd9fec8719/499a7402-8493-4895-8ffb-7b2ba6ddd974.lease</ovirt-vm:leasePath> <ovirt-vm:path>/rhev/data-center/mnt/beetle-nfs-lif2.lab.eng.tlv2.redhat.com:_rhv_compute_compute-he-7_nfs__0/d2b2f071-6358-4092-bd1f-38fde1d0deed/images/75bb1afd-0e0f-4dfe-9829-43fd9fec8719/499a7402-8493-4895-8ffb-7b2ba6ddd974</ovirt-vm:path> <ovirt-vm:volumeID>499a7402-8493-4895-8ffb-7b2ba6ddd974</ovirt-vm:volumeID> </ovirt-vm:volumeChainNode> after start xml: <ovirt-vm:volumeChainNode> <ovirt-vm:domainID>d2b2f071-6358-4092-bd1f-38fde1d0deed</ovirt-vm:domainID> <ovirt-vm:imageID>75bb1afd-0e0f-4dfe-9829-43fd9fec8719</ovirt-vm:imageID> <ovirt-vm:leaseOffset type="int">0</ovirt-vm:leaseOffset> <ovirt-vm:leasePath>/rhev/data-center/mnt/beetle-nfs-lif2.lab.eng.tlv2.redhat.com:_rhv_compute_compute-he-7_nfs__0/d2b2f071-6358-4092-bd1f-38fde1d0deed/images/75bb1afd-0e0f-4dfe-9829-43fd9fec8719/5c887d73-afb7-4a3a-a5fc-71fdb8e9749a.lease</ovirt-vm:leasePath> <ovirt-vm:path>/rhev/data-center/mnt/beetle-nfs-lif2.lab.eng.tlv2.redhat.com:_rhv_compute_compute-he-7_nfs__0/d2b2f071-6358-4092-bd1f-38fde1d0deed/images/75bb1afd-0e0f-4dfe-9829-43fd9fec8719/5c887d73-afb7-4a3a-a5fc-71fdb8e9749a</ovirt-vm:path> <ovirt-vm:volumeID>5c887d73-afb7-4a3a-a5fc-71fdb8e9749a</ovirt-vm:volumeID> </ovirt-vm:volumeChainNode>
Beni, please attach the engine and VDSM logs. We can't do anything without it.
Some inputs: 1. When doing all the above steps but removing the 1st snapshot while the VM is down all work well. 2. When powering-off the VM and re-run it, snapshot removal succeeded. VDSM fails before the given error with: lib.vdsm.storage.volume.VolumeManifest#validateDelete -> VolumeImageHasChildren Exception 2020-06-10 13:25:11,759+0300 ERROR (tasks/9) [storage.TaskManager.Task] (Task='072bc201-eb82-47d2-bc03-e62b280a64c4') Unexpected error (task:880) Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/vdsm/storage/task.py", line 887, in _run return fn(*args, **kargs) File "/usr/lib/python3.7/site-packages/vdsm/storage/task.py", line 350, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python3.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python3.7/site-packages/vdsm/storage/sp.py", line 1901, in deleteVolume vol.delete(postZero=postZero, force=force, discard=discard) File "/usr/lib/python3.7/site-packages/vdsm/storage/fileVolume.py", line 595, in delete self.validateDelete() File "/usr/lib/python3.7/site-packages/vdsm/storage/volume.py", line 1219, in validateDelete self._manifest.validateDelete() File "/usr/lib/python3.7/site-packages/vdsm/storage/volume.py", line 543, in validateDelete raise se.VolumeImageHasChildren(self) vdsm.storage.exception.VolumeImageHasChildren: Cannot delete volume which has children (non-ethical): ['sdUUID: 2a526c8a-3be7-4138-ad99-b68c2294f7cc', 'imgUUID: 4fde3446-dc 30-4d05-93fe-74a37c3405a5', 'volUUID: 82bcbbaa-fd8c-4aba-9387-ffe944068aa7']
Also when having no memory disks the issue doesn't appear. Steps: 1. Create a VM with a disk 2. Take two snapshots 3. Preview and commit the 2nd snapshot 4. Start the VM 5. Try deleting the 1st snapshot. Same is true when doing the same flow when the VM runs but not using memory disks: 1. Create a VM with a disk 2. Run the VM 3. Take two snapshots without memory 4. Preview and commit the 2nd snapshot 5. Start the VM 6. Try deleting the 1st snapshot. Liran, I think it is maybe related to the memory disks change. Can you please have a look?
Created attachment 1696489 [details] engine.log
Created attachment 1696490 [details] vdsm.log
I wrote a comment on another bug which in my opinion seem to relate. Please see: https://bugzilla.redhat.com/show_bug.cgi?id=1840609
According to your analysis, do you think the fix for bug 1840609 will solve this issue also?
It is, taking this bug.
Verified (by automation) with: - ovirt-engine-4.4.1.7-0.3.el8ev.noarch Verification steps: 1. Create and start a VM (RHEL8.2) 2. Take two snapshots with memory and shutdown the VM 3. Preview and commit the 2nd snapshot 4. Start the VM and make sure the VM is up and running 5. Delete both snapshots
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.