Bug 1842894

Summary: [4.4] Can't delete snapshot with memory (after review & commit) in case of multiple snapshots
Product: [oVirt] vdsm Reporter: Beni Pelled <bpelled>
Component: GeneralAssignee: Liran Rotenberg <lrotenbe>
Status: CLOSED CURRENTRELEASE QA Contact: Beni Pelled <bpelled>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.40.18CC: ahadas, bugs, eshenitz, lrotenbe
Target Milestone: ovirt-4.4.1Keywords: Automation
Target Release: 4.40.21Flags: pm-rhel: ovirt-4.4+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.40.21 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-08 08:30:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1840609    
Bug Blocks:    
Attachments:
Description Flags
dumxml_after_taking_snapshots
none
dumxml_after_starting_the_vm
none
engine.log
none
vdsm.log none

Description Beni Pelled 2020-06-02 10:56:57 UTC
Created attachment 1694382 [details]
dumxml_after_taking_snapshots

Description of problem:
In the case of two snapshots with memory, both snapshots can't be deleted after `preview & commit` the 2nd snapshot

Version-Release number of selected component (if applicable):
- ovirt-engine-4.4.1.1-0.5.el8ev.noarch
- vdsm-4.40.18-1.el8ev.x86_64
- libvirt-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create and start a VM (RHEL8.2 in my case)
2. Take two snapshots with memory
3. Shutdown the VM
4. Preview and commit the 2nd snapshot
5. Start the VM
6. Try deleting one of the snapshots

Actual results:
Snapshot deletion fails (see engine.log and vdsm.log below)

Expected results:
Both snapshots deleted successfully 

Additional info:
The attached dumpxml files show the volume changes


- engine.log

2020-06-02 13:27:57,121+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-5) [cc76129a-5e95-431b-937a-39d01251cb84] Command 'MergeVDSCommand(HostName = host_mixed_3, MergeVDSCommandParameters:{hostId='f4cc8c8c-227c-472a-b4c3-54c03511295f', vmId='a153906f-23c1-4e90-9a3f-41ed88e95bb9', storagePoolId='932aba2a-17fe-4cfc-9d7a-1c509510550f', storageDomainId='d2b2f071-6358-4092-bd1f-38fde1d0deed', imageGroupId='b26aa91f-edf9-433c-83b8-edd529853511', imageId='ed20fc47-63e6-4442-9cf9-96889754d8f1', baseImageId='0e33b218-7d41-46e4-9a13-e632e8aad5bf', topImageId='0f41d667-6b7c-43bf-862c-fe2dcbff0b47', bandwidth='0'})' execution failed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52
2020-06-02 13:27:57,121+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-5) [cc76129a-5e95-431b-937a-39d01251cb84] FINISH, MergeVDSCommand, return: , log id: 63b4b094
2020-06-02 13:27:57,121+03 ERROR [org.ovirt.engine.core.bll.MergeCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-5) [cc76129a-5e95-431b-937a-39d01251cb84] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52 (Failed with error mergeErr and code 52)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:114)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33)


- vdsm.log

2020-06-02 13:27:57,103+0300 INFO  (jsonrpc/5) [api.virt] START merge(drive={'poolID': '932aba2a-17fe-4cfc-9d7a-1c509510550f', 'volumeID': 'ed20fc47-63e6-4442-9cf9-96889754d8f1', 'imageID': 'b26aa91f-edf9-433c-83b8-edd529853511', 'domainID': 'd2b2f071-6358-4092-bd1f-38fde1d0deed'}, baseVolUUID='0e33b218-7d41-46e4-9a13-e632e8aad5bf', topVolUUID='0f41d667-6b7c-43bf-862c-fe2dcbff0b47', bandwidth='0', jobUUID='6e86f949-58e3-43fc-90bd-5d2a532a6352') from=::ffff:10.46.12.114,59914, flow_id=cc76129a-5e95-431b-937a-39d01251cb84, vmId=a153906f-23c1-4e90-9a3f-41ed88e95bb9 (api:48)
2020-06-02 13:27:57,117+0300 ERROR (jsonrpc/5) [virt.vm] (vmId='a153906f-23c1-4e90-9a3f-41ed88e95bb9') merge: Cannot find volume 0f41d667-6b7c-43bf-862c-fe2dcbff0b47 in drive vda's volume chain (vm:5310)
2020-06-02 13:27:57,117+0300 INFO  (jsonrpc/5) [api.virt] FINISH merge return={'status': {'code': 52, 'message': 'Merge failed'}} from=::ffff:10.46.12.114,59914, flow_id=cc76129a-5e95-431b-937a-39d01251cb84, vmId=a153906f-23c1-4e90-9a3f-41ed88e95bb9 (api:54)
2020-06-02 13:27:57,117+0300 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call VM.merge failed (error 52) in 0.01 seconds (__init__:312)
2020-06-02 13:27:57,478+0300 INFO  (jsonrpc/7) [api.host] START getStats() from=::ffff:10.46.12.114,59914 (api:48)

Comment 1 Beni Pelled 2020-06-02 10:57:53 UTC
Created attachment 1694383 [details]
dumxml_after_starting_the_vm

Comment 2 Tal Nisan 2020-06-08 09:30:34 UTC
Liran, can you please check if this is related to your latest changes? This used to work until not long ago

Comment 3 Liran Rotenberg 2020-06-08 10:36:44 UTC
I don't see how it relates to my changes which relate to snapshot creation.
The domxml doesn't fit the snippet given of the engine/vdsm log but in the xml one volume has changed:

after snapshot xml:
<ovirt-vm:volumeChainNode>
<ovirt-vm:domainID>d2b2f071-6358-4092-bd1f-38fde1d0deed</ovirt-vm:domainID>
<ovirt-vm:imageID>75bb1afd-0e0f-4dfe-9829-43fd9fec8719</ovirt-vm:imageID>
<ovirt-vm:leaseOffset type="int">0</ovirt-vm:leaseOffset>
<ovirt-vm:leasePath>/rhev/data-center/mnt/beetle-nfs-lif2.lab.eng.tlv2.redhat.com:_rhv_compute_compute-he-7_nfs__0/d2b2f071-6358-4092-bd1f-38fde1d0deed/images/75bb1afd-0e0f-4dfe-9829-43fd9fec8719/499a7402-8493-4895-8ffb-7b2ba6ddd974.lease</ovirt-vm:leasePath>
<ovirt-vm:path>/rhev/data-center/mnt/beetle-nfs-lif2.lab.eng.tlv2.redhat.com:_rhv_compute_compute-he-7_nfs__0/d2b2f071-6358-4092-bd1f-38fde1d0deed/images/75bb1afd-0e0f-4dfe-9829-43fd9fec8719/499a7402-8493-4895-8ffb-7b2ba6ddd974</ovirt-vm:path>
<ovirt-vm:volumeID>499a7402-8493-4895-8ffb-7b2ba6ddd974</ovirt-vm:volumeID>
</ovirt-vm:volumeChainNode>

after start xml:
<ovirt-vm:volumeChainNode>
<ovirt-vm:domainID>d2b2f071-6358-4092-bd1f-38fde1d0deed</ovirt-vm:domainID>
<ovirt-vm:imageID>75bb1afd-0e0f-4dfe-9829-43fd9fec8719</ovirt-vm:imageID>
<ovirt-vm:leaseOffset type="int">0</ovirt-vm:leaseOffset>
<ovirt-vm:leasePath>/rhev/data-center/mnt/beetle-nfs-lif2.lab.eng.tlv2.redhat.com:_rhv_compute_compute-he-7_nfs__0/d2b2f071-6358-4092-bd1f-38fde1d0deed/images/75bb1afd-0e0f-4dfe-9829-43fd9fec8719/5c887d73-afb7-4a3a-a5fc-71fdb8e9749a.lease</ovirt-vm:leasePath>
<ovirt-vm:path>/rhev/data-center/mnt/beetle-nfs-lif2.lab.eng.tlv2.redhat.com:_rhv_compute_compute-he-7_nfs__0/d2b2f071-6358-4092-bd1f-38fde1d0deed/images/75bb1afd-0e0f-4dfe-9829-43fd9fec8719/5c887d73-afb7-4a3a-a5fc-71fdb8e9749a</ovirt-vm:path>
<ovirt-vm:volumeID>5c887d73-afb7-4a3a-a5fc-71fdb8e9749a</ovirt-vm:volumeID>
</ovirt-vm:volumeChainNode>

Comment 4 Eyal Shenitzky 2020-06-10 08:19:38 UTC
Beni, please attach the engine and VDSM logs.
We can't do anything without it.

Comment 5 Eyal Shenitzky 2020-06-10 10:55:55 UTC
Some inputs:

1. When doing all the above steps but removing the 1st snapshot while the VM is down all work well.
2. When powering-off the VM and re-run it, snapshot removal succeeded.


VDSM fails before the given error with:

lib.vdsm.storage.volume.VolumeManifest#validateDelete -> VolumeImageHasChildren Exception

2020-06-10 13:25:11,759+0300 ERROR (tasks/9) [storage.TaskManager.Task] (Task='072bc201-eb82-47d2-bc03-e62b280a64c4') Unexpected error (task:880)
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/vdsm/storage/task.py", line 887, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python3.7/site-packages/vdsm/storage/task.py", line 350, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python3.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python3.7/site-packages/vdsm/storage/sp.py", line 1901, in deleteVolume
    vol.delete(postZero=postZero, force=force, discard=discard)
  File "/usr/lib/python3.7/site-packages/vdsm/storage/fileVolume.py", line 595, in delete
    self.validateDelete()
  File "/usr/lib/python3.7/site-packages/vdsm/storage/volume.py", line 1219, in validateDelete
    self._manifest.validateDelete()
  File "/usr/lib/python3.7/site-packages/vdsm/storage/volume.py", line 543, in validateDelete
    raise se.VolumeImageHasChildren(self)
vdsm.storage.exception.VolumeImageHasChildren: Cannot delete volume which has children (non-ethical): ['sdUUID: 2a526c8a-3be7-4138-ad99-b68c2294f7cc', 'imgUUID: 4fde3446-dc
30-4d05-93fe-74a37c3405a5', 'volUUID: 82bcbbaa-fd8c-4aba-9387-ffe944068aa7']

Comment 6 Eyal Shenitzky 2020-06-10 11:36:25 UTC
Also when having no memory disks the issue doesn't appear.

Steps:

1. Create a VM with a disk
2. Take two snapshots
3. Preview and commit the 2nd snapshot
4. Start the VM
5. Try deleting the 1st snapshot.

Same is true when doing the same flow when the VM runs but not using memory disks:

1. Create a VM with a disk
2. Run the VM
3. Take two snapshots without memory
4. Preview and commit the 2nd snapshot
5. Start the VM
6. Try deleting the 1st snapshot.

Liran, I think it is maybe related to the memory disks change.

Can you please have a look?

Comment 7 Beni Pelled 2020-06-10 13:38:21 UTC
Created attachment 1696489 [details]
engine.log

Comment 8 Beni Pelled 2020-06-10 13:39:11 UTC
Created attachment 1696490 [details]
vdsm.log

Comment 9 Liran Rotenberg 2020-06-15 15:15:05 UTC
I wrote a comment on another bug which in my opinion seem to relate.
Please see: https://bugzilla.redhat.com/show_bug.cgi?id=1840609

Comment 10 Eyal Shenitzky 2020-06-16 07:37:07 UTC
According to your analysis, do you think the fix for bug 1840609 will solve this issue also?

Comment 11 Liran Rotenberg 2020-06-23 07:51:15 UTC
It is, taking this bug.

Comment 12 Beni Pelled 2020-07-08 08:24:11 UTC
Verified (by automation) with:
- ovirt-engine-4.4.1.7-0.3.el8ev.noarch

Verification steps:
1. Create and start a VM (RHEL8.2)
2. Take two snapshots with memory and shutdown the VM
3. Preview and commit the 2nd snapshot
4. Start the VM and make sure the VM is up and running
5. Delete both snapshots

Comment 13 Sandro Bonazzola 2020-07-08 08:30:51 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.