Bug 1842894 - [4.4] Can't delete snapshot with memory (after review & commit) in case of multiple snapshots
Summary: [4.4] Can't delete snapshot with memory (after review & commit) in case of mu...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.40.18
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.4.1
: 4.40.21
Assignee: Liran Rotenberg
QA Contact: Beni Pelled
URL:
Whiteboard:
Depends On: 1840609
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-02 10:56 UTC by Beni Pelled
Modified: 2020-07-08 08:30 UTC (History)
4 users (show)

Fixed In Version: vdsm-4.40.21
Clone Of:
Environment:
Last Closed: 2020-07-08 08:30:51 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)
dumxml_after_taking_snapshots (18.37 KB, text/plain)
2020-06-02 10:56 UTC, Beni Pelled
no flags Details
dumxml_after_starting_the_vm (17.88 KB, text/plain)
2020-06-02 10:57 UTC, Beni Pelled
no flags Details
engine.log (2.60 MB, text/plain)
2020-06-10 13:38 UTC, Beni Pelled
no flags Details
vdsm.log (13.81 MB, text/plain)
2020-06-10 13:39 UTC, Beni Pelled
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 109831 0 master MERGED devices: remove backingstore in restore 2020-08-24 11:08:00 UTC

Description Beni Pelled 2020-06-02 10:56:57 UTC
Created attachment 1694382 [details]
dumxml_after_taking_snapshots

Description of problem:
In the case of two snapshots with memory, both snapshots can't be deleted after `preview & commit` the 2nd snapshot

Version-Release number of selected component (if applicable):
- ovirt-engine-4.4.1.1-0.5.el8ev.noarch
- vdsm-4.40.18-1.el8ev.x86_64
- libvirt-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create and start a VM (RHEL8.2 in my case)
2. Take two snapshots with memory
3. Shutdown the VM
4. Preview and commit the 2nd snapshot
5. Start the VM
6. Try deleting one of the snapshots

Actual results:
Snapshot deletion fails (see engine.log and vdsm.log below)

Expected results:
Both snapshots deleted successfully 

Additional info:
The attached dumpxml files show the volume changes


- engine.log

2020-06-02 13:27:57,121+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-5) [cc76129a-5e95-431b-937a-39d01251cb84] Command 'MergeVDSCommand(HostName = host_mixed_3, MergeVDSCommandParameters:{hostId='f4cc8c8c-227c-472a-b4c3-54c03511295f', vmId='a153906f-23c1-4e90-9a3f-41ed88e95bb9', storagePoolId='932aba2a-17fe-4cfc-9d7a-1c509510550f', storageDomainId='d2b2f071-6358-4092-bd1f-38fde1d0deed', imageGroupId='b26aa91f-edf9-433c-83b8-edd529853511', imageId='ed20fc47-63e6-4442-9cf9-96889754d8f1', baseImageId='0e33b218-7d41-46e4-9a13-e632e8aad5bf', topImageId='0f41d667-6b7c-43bf-862c-fe2dcbff0b47', bandwidth='0'})' execution failed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52
2020-06-02 13:27:57,121+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-5) [cc76129a-5e95-431b-937a-39d01251cb84] FINISH, MergeVDSCommand, return: , log id: 63b4b094
2020-06-02 13:27:57,121+03 ERROR [org.ovirt.engine.core.bll.MergeCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-5) [cc76129a-5e95-431b-937a-39d01251cb84] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52 (Failed with error mergeErr and code 52)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:114)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33)


- vdsm.log

2020-06-02 13:27:57,103+0300 INFO  (jsonrpc/5) [api.virt] START merge(drive={'poolID': '932aba2a-17fe-4cfc-9d7a-1c509510550f', 'volumeID': 'ed20fc47-63e6-4442-9cf9-96889754d8f1', 'imageID': 'b26aa91f-edf9-433c-83b8-edd529853511', 'domainID': 'd2b2f071-6358-4092-bd1f-38fde1d0deed'}, baseVolUUID='0e33b218-7d41-46e4-9a13-e632e8aad5bf', topVolUUID='0f41d667-6b7c-43bf-862c-fe2dcbff0b47', bandwidth='0', jobUUID='6e86f949-58e3-43fc-90bd-5d2a532a6352') from=::ffff:10.46.12.114,59914, flow_id=cc76129a-5e95-431b-937a-39d01251cb84, vmId=a153906f-23c1-4e90-9a3f-41ed88e95bb9 (api:48)
2020-06-02 13:27:57,117+0300 ERROR (jsonrpc/5) [virt.vm] (vmId='a153906f-23c1-4e90-9a3f-41ed88e95bb9') merge: Cannot find volume 0f41d667-6b7c-43bf-862c-fe2dcbff0b47 in drive vda's volume chain (vm:5310)
2020-06-02 13:27:57,117+0300 INFO  (jsonrpc/5) [api.virt] FINISH merge return={'status': {'code': 52, 'message': 'Merge failed'}} from=::ffff:10.46.12.114,59914, flow_id=cc76129a-5e95-431b-937a-39d01251cb84, vmId=a153906f-23c1-4e90-9a3f-41ed88e95bb9 (api:54)
2020-06-02 13:27:57,117+0300 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call VM.merge failed (error 52) in 0.01 seconds (__init__:312)
2020-06-02 13:27:57,478+0300 INFO  (jsonrpc/7) [api.host] START getStats() from=::ffff:10.46.12.114,59914 (api:48)

Comment 1 Beni Pelled 2020-06-02 10:57:53 UTC
Created attachment 1694383 [details]
dumxml_after_starting_the_vm

Comment 2 Tal Nisan 2020-06-08 09:30:34 UTC
Liran, can you please check if this is related to your latest changes? This used to work until not long ago

Comment 3 Liran Rotenberg 2020-06-08 10:36:44 UTC
I don't see how it relates to my changes which relate to snapshot creation.
The domxml doesn't fit the snippet given of the engine/vdsm log but in the xml one volume has changed:

after snapshot xml:
<ovirt-vm:volumeChainNode>
<ovirt-vm:domainID>d2b2f071-6358-4092-bd1f-38fde1d0deed</ovirt-vm:domainID>
<ovirt-vm:imageID>75bb1afd-0e0f-4dfe-9829-43fd9fec8719</ovirt-vm:imageID>
<ovirt-vm:leaseOffset type="int">0</ovirt-vm:leaseOffset>
<ovirt-vm:leasePath>/rhev/data-center/mnt/beetle-nfs-lif2.lab.eng.tlv2.redhat.com:_rhv_compute_compute-he-7_nfs__0/d2b2f071-6358-4092-bd1f-38fde1d0deed/images/75bb1afd-0e0f-4dfe-9829-43fd9fec8719/499a7402-8493-4895-8ffb-7b2ba6ddd974.lease</ovirt-vm:leasePath>
<ovirt-vm:path>/rhev/data-center/mnt/beetle-nfs-lif2.lab.eng.tlv2.redhat.com:_rhv_compute_compute-he-7_nfs__0/d2b2f071-6358-4092-bd1f-38fde1d0deed/images/75bb1afd-0e0f-4dfe-9829-43fd9fec8719/499a7402-8493-4895-8ffb-7b2ba6ddd974</ovirt-vm:path>
<ovirt-vm:volumeID>499a7402-8493-4895-8ffb-7b2ba6ddd974</ovirt-vm:volumeID>
</ovirt-vm:volumeChainNode>

after start xml:
<ovirt-vm:volumeChainNode>
<ovirt-vm:domainID>d2b2f071-6358-4092-bd1f-38fde1d0deed</ovirt-vm:domainID>
<ovirt-vm:imageID>75bb1afd-0e0f-4dfe-9829-43fd9fec8719</ovirt-vm:imageID>
<ovirt-vm:leaseOffset type="int">0</ovirt-vm:leaseOffset>
<ovirt-vm:leasePath>/rhev/data-center/mnt/beetle-nfs-lif2.lab.eng.tlv2.redhat.com:_rhv_compute_compute-he-7_nfs__0/d2b2f071-6358-4092-bd1f-38fde1d0deed/images/75bb1afd-0e0f-4dfe-9829-43fd9fec8719/5c887d73-afb7-4a3a-a5fc-71fdb8e9749a.lease</ovirt-vm:leasePath>
<ovirt-vm:path>/rhev/data-center/mnt/beetle-nfs-lif2.lab.eng.tlv2.redhat.com:_rhv_compute_compute-he-7_nfs__0/d2b2f071-6358-4092-bd1f-38fde1d0deed/images/75bb1afd-0e0f-4dfe-9829-43fd9fec8719/5c887d73-afb7-4a3a-a5fc-71fdb8e9749a</ovirt-vm:path>
<ovirt-vm:volumeID>5c887d73-afb7-4a3a-a5fc-71fdb8e9749a</ovirt-vm:volumeID>
</ovirt-vm:volumeChainNode>

Comment 4 Eyal Shenitzky 2020-06-10 08:19:38 UTC
Beni, please attach the engine and VDSM logs.
We can't do anything without it.

Comment 5 Eyal Shenitzky 2020-06-10 10:55:55 UTC
Some inputs:

1. When doing all the above steps but removing the 1st snapshot while the VM is down all work well.
2. When powering-off the VM and re-run it, snapshot removal succeeded.


VDSM fails before the given error with:

lib.vdsm.storage.volume.VolumeManifest#validateDelete -> VolumeImageHasChildren Exception

2020-06-10 13:25:11,759+0300 ERROR (tasks/9) [storage.TaskManager.Task] (Task='072bc201-eb82-47d2-bc03-e62b280a64c4') Unexpected error (task:880)
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/vdsm/storage/task.py", line 887, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python3.7/site-packages/vdsm/storage/task.py", line 350, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python3.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python3.7/site-packages/vdsm/storage/sp.py", line 1901, in deleteVolume
    vol.delete(postZero=postZero, force=force, discard=discard)
  File "/usr/lib/python3.7/site-packages/vdsm/storage/fileVolume.py", line 595, in delete
    self.validateDelete()
  File "/usr/lib/python3.7/site-packages/vdsm/storage/volume.py", line 1219, in validateDelete
    self._manifest.validateDelete()
  File "/usr/lib/python3.7/site-packages/vdsm/storage/volume.py", line 543, in validateDelete
    raise se.VolumeImageHasChildren(self)
vdsm.storage.exception.VolumeImageHasChildren: Cannot delete volume which has children (non-ethical): ['sdUUID: 2a526c8a-3be7-4138-ad99-b68c2294f7cc', 'imgUUID: 4fde3446-dc
30-4d05-93fe-74a37c3405a5', 'volUUID: 82bcbbaa-fd8c-4aba-9387-ffe944068aa7']

Comment 6 Eyal Shenitzky 2020-06-10 11:36:25 UTC
Also when having no memory disks the issue doesn't appear.

Steps:

1. Create a VM with a disk
2. Take two snapshots
3. Preview and commit the 2nd snapshot
4. Start the VM
5. Try deleting the 1st snapshot.

Same is true when doing the same flow when the VM runs but not using memory disks:

1. Create a VM with a disk
2. Run the VM
3. Take two snapshots without memory
4. Preview and commit the 2nd snapshot
5. Start the VM
6. Try deleting the 1st snapshot.

Liran, I think it is maybe related to the memory disks change.

Can you please have a look?

Comment 7 Beni Pelled 2020-06-10 13:38:21 UTC
Created attachment 1696489 [details]
engine.log

Comment 8 Beni Pelled 2020-06-10 13:39:11 UTC
Created attachment 1696490 [details]
vdsm.log

Comment 9 Liran Rotenberg 2020-06-15 15:15:05 UTC
I wrote a comment on another bug which in my opinion seem to relate.
Please see: https://bugzilla.redhat.com/show_bug.cgi?id=1840609

Comment 10 Eyal Shenitzky 2020-06-16 07:37:07 UTC
According to your analysis, do you think the fix for bug 1840609 will solve this issue also?

Comment 11 Liran Rotenberg 2020-06-23 07:51:15 UTC
It is, taking this bug.

Comment 12 Beni Pelled 2020-07-08 08:24:11 UTC
Verified (by automation) with:
- ovirt-engine-4.4.1.7-0.3.el8ev.noarch

Verification steps:
1. Create and start a VM (RHEL8.2)
2. Take two snapshots with memory and shutdown the VM
3. Preview and commit the 2nd snapshot
4. Start the VM and make sure the VM is up and running
5. Delete both snapshots

Comment 13 Sandro Bonazzola 2020-07-08 08:30:51 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.