Bug 1796124

Summary: Live Merge and Remove Snapshot fails
Product: [oVirt] vdsm Reporter: mlehrer
Component: CoreAssignee: Nir Soffer <nsoffer>
Status: CLOSED CURRENTRELEASE QA Contact: Evelina Shames <eshames>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.30.33CC: aefrat, bugs, eshenitz, lleistne, lsvaty, mkalinin, nsoffer, rhodain, sfishbai, tnisan, vjuranek
Target Milestone: ovirt-4.4.5Flags: abawer: needinfo+
pm-rhel: ovirt-4.4+
lsvaty: testing_plan_complete?
pm-rhel: devel_ack+
pm-rhel: testing_ack+
Target Release: 4.40.50.7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.40.50.7 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1796415 (view as bug list) Environment:
Last Closed: 2021-03-18 15:16:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1796415    

Description mlehrer 2020-01-29 16:27:50 UTC
Description of problem:
Attempted the following: 

VM is running and writing to disk, a VM snapshot is performed, disk is extended, then a snapshot is removed, then another snapshot is taken.

During snapshot removal following issues occur:


vdsm.log
2020-01-28 18:22:51,039+0000 INFO  (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') CPU stopped: onSuspend (vm:6062)
2020-01-28 18:22:51,042+0000 WARN  (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') device vda out of space (vm:3997)
2020-01-28 18:22:51,043+0000 INFO  (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') abnormal vm stop device ua-624caba8-2d6d-4596-bf7d-e5ba50b4632c error enospc (vm:5075)
2020-01-28 18:22:51,043+0000 INFO  (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') CPU stopped: onIOError (vm:6062)
2020-01-28 18:22:51,048+0000 INFO  (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') Requesting extension for volume 3f7b06fe-0d5e-4c5e-8c70-b908f1e18bd2 on domain 7ff38788-4421-4029-b23c-6159f155c9f2 (apparent: 1073741824, capacity: 107374182400, allocated: 1078198272, physical: 1073741824 threshold_state: exceeded) (vm:1288)
2020-01-28 18:22:51,048+0000 INFO  (libvirt/events) [vdsm.api] START sendExtendMsg(spUUID='d8bdc838-93f3-4833-9774-e41ad0881f9e', volDict={'newSize': 2147483648, 'domainID': u'7ff38788-4421-4029-b23c-6159f155c9f2', 'name': 'vda', 'poolID': 'd8bdc838-93f3-4833-9774-e41ad0881f9e', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'internal': False, 'volumeID': u'3f7b06fe-0d5e-4c5e-8c70-b908f1e18bd2', 'imageID': u'624caba8-2d6d-4596-bf7d-e5ba50b4632c'}, newSize=2147483648, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7fc333410110>>) from=internal, task_id=507d3975-b6d5-4674-b26f-f77ed96f2d73 (api:48)


Resulting in:

engine.log
2020-01-28 19:37:28,871-05 WARN  [org.ovirt.engine.core.vdsbroker.libvirt.VmDevicesConverter] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [e11a6f96-3e61-4566-98b2-f3c280755c5f] unmanaged disk with path '/var/run/vdsm/payload/f13ca8ef-d81b-4aaa-bec0-a31f69ca1421.0f6e773a0c690fd566c4718f0d76cdb2.img' is ignored
2020-01-28 19:37:28,878-05 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Failed to live merge. Top volume 22cd84b2-0809-4365-a629-1ac5f928ac7c is still in qemu chain [dd848b17-889d-449e-980f-80caab017565, 22cd84b2-0809-4365-a629-1ac5f928ac7c, acebaabc-a96b-407f-8408-bb59d193910c]
2020-01-28 19:37:30,834-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Command id: '63a21e84-0187-496f-8964-581644da0b2d failed child command status for step 'MERGE_STATUS'
2020-01-28 19:37:30,834-05 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Command 'RemoveSnapshotSingleDiskLive' id: '63a21e84-0187-496f-8964-581644da0b2d' child commands '[e0e206bc-eae3-42b8-a39d-665395f5f539, fe5081a9-8f06-4fe0-8ec6-c707636a2a42, 311507c6-541e-405e-8c95-7507db6df3fa]' executions were completed, status 'FAILED'
2020-01-28 19:37:31,885-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-31) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Merging of snapshot '643a447d-4837-404d-9772-da28b1a544a4' images 'acebaabc-a96b-407f-8408-bb59d193910c'..'22cd84b2-0809-4365-a629-1ac5f928ac7c' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation.
2020-01-28 19:37:31,897-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-31) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand' with failure.
2020-01-28 19:37:31,916-05 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-31) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Command 'RemoveSnapshot' id: '56b00a6c-9260-4d3d-982f-5aab7be4c8be' child commands '[63a21e84-0187-496f-8964-581644da0b2d]' executions were completed, status 'FAILED'
2020-01-28 19:37:33,003-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-68) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand' with failure.
2020-01-28 19:37:33,052-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-68) [e11a6f96-3e61-4566-98b2-f3c280755c5f] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_FAILURE(357), Failed to delete snapshot 'VM_JMeter_Snapshot_42588' for VM 'VM_SnapshotBug_1'.

Logs are ovirt log collector and separate tar of relevant vdsms[1]

[1] https://drive.google.com/open?id=1Pi-EPyPVwVh6r6ya8ikMCDphgJyhNKB6


Version-Release number of selected component (if applicable):
Version 4.3.8.1-0.1.master.el7


How reproducible:
reproduces

Steps to Reproduce:
1. Create VM with an existing snapshot
2. Take VM snapshot
3. IO activity force extend
4. Wait 5 mins
5. Remove Snapshot


Actual results:

Failed to live merge. Top volume 22cd84b2-0809-4365-a629-1ac5f928ac7c is still in qemu chain [dd848b17-889d-449e-980f-80caab017565, 22cd84b2-0809-4365-a629-1ac5f928ac7c, acebaabc-a96b-407f-8408-bb59d193910c]
2020-01-28 19:37:30,834-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Command id: '63a21e84-0187-496f-8964-581644da0b2d failed child command status for step 'MERGE_STATUS'

Expected results:
Successful live merge/vm snapshot removal


Additional info:
Logs are ovirt log collector and seperate tar of relevant vdsms[1]

[1] https://drive.google.com/open?id=1Pi-EPyPVwVh6r6ya8ikMCDphgJyhNKB6

Comment 3 Sandro Bonazzola 2020-05-18 14:46:57 UTC
Moved to 4.4.1 not being marked as blocker for 4.4.0 and we are preparing to GA.

Comment 13 mlehrer 2020-11-29 09:29:00 UTC
Moving to Avihai as this isnt scale issue as discussed offline see comment 10

Comment 24 Evelina Shames 2021-03-17 09:36:41 UTC
As bug 1796415 was verified, this bug can be considered as verified too.

Moving to VERIFIED.

Comment 25 Sandro Bonazzola 2021-03-18 15:16:12 UTC
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.