Description of problem: Attempted the following: VM is running and writing to disk, a VM snapshot is performed, disk is extended, then a snapshot is removed, then another snapshot is taken. During snapshot removal following issues occur: vdsm.log 2020-01-28 18:22:51,039+0000 INFO (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') CPU stopped: onSuspend (vm:6062) 2020-01-28 18:22:51,042+0000 WARN (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') device vda out of space (vm:3997) 2020-01-28 18:22:51,043+0000 INFO (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') abnormal vm stop device ua-624caba8-2d6d-4596-bf7d-e5ba50b4632c error enospc (vm:5075) 2020-01-28 18:22:51,043+0000 INFO (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') CPU stopped: onIOError (vm:6062) 2020-01-28 18:22:51,048+0000 INFO (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') Requesting extension for volume 3f7b06fe-0d5e-4c5e-8c70-b908f1e18bd2 on domain 7ff38788-4421-4029-b23c-6159f155c9f2 (apparent: 1073741824, capacity: 107374182400, allocated: 1078198272, physical: 1073741824 threshold_state: exceeded) (vm:1288) 2020-01-28 18:22:51,048+0000 INFO (libvirt/events) [vdsm.api] START sendExtendMsg(spUUID='d8bdc838-93f3-4833-9774-e41ad0881f9e', volDict={'newSize': 2147483648, 'domainID': u'7ff38788-4421-4029-b23c-6159f155c9f2', 'name': 'vda', 'poolID': 'd8bdc838-93f3-4833-9774-e41ad0881f9e', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'internal': False, 'volumeID': u'3f7b06fe-0d5e-4c5e-8c70-b908f1e18bd2', 'imageID': u'624caba8-2d6d-4596-bf7d-e5ba50b4632c'}, newSize=2147483648, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7fc333410110>>) from=internal, task_id=507d3975-b6d5-4674-b26f-f77ed96f2d73 (api:48) Resulting in: engine.log 2020-01-28 19:37:28,871-05 WARN [org.ovirt.engine.core.vdsbroker.libvirt.VmDevicesConverter] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [e11a6f96-3e61-4566-98b2-f3c280755c5f] unmanaged disk with path '/var/run/vdsm/payload/f13ca8ef-d81b-4aaa-bec0-a31f69ca1421.0f6e773a0c690fd566c4718f0d76cdb2.img' is ignored 2020-01-28 19:37:28,878-05 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Failed to live merge. Top volume 22cd84b2-0809-4365-a629-1ac5f928ac7c is still in qemu chain [dd848b17-889d-449e-980f-80caab017565, 22cd84b2-0809-4365-a629-1ac5f928ac7c, acebaabc-a96b-407f-8408-bb59d193910c] 2020-01-28 19:37:30,834-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Command id: '63a21e84-0187-496f-8964-581644da0b2d failed child command status for step 'MERGE_STATUS' 2020-01-28 19:37:30,834-05 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Command 'RemoveSnapshotSingleDiskLive' id: '63a21e84-0187-496f-8964-581644da0b2d' child commands '[e0e206bc-eae3-42b8-a39d-665395f5f539, fe5081a9-8f06-4fe0-8ec6-c707636a2a42, 311507c6-541e-405e-8c95-7507db6df3fa]' executions were completed, status 'FAILED' 2020-01-28 19:37:31,885-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-31) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Merging of snapshot '643a447d-4837-404d-9772-da28b1a544a4' images 'acebaabc-a96b-407f-8408-bb59d193910c'..'22cd84b2-0809-4365-a629-1ac5f928ac7c' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation. 2020-01-28 19:37:31,897-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-31) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand' with failure. 2020-01-28 19:37:31,916-05 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-31) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Command 'RemoveSnapshot' id: '56b00a6c-9260-4d3d-982f-5aab7be4c8be' child commands '[63a21e84-0187-496f-8964-581644da0b2d]' executions were completed, status 'FAILED' 2020-01-28 19:37:33,003-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-68) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand' with failure. 2020-01-28 19:37:33,052-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-68) [e11a6f96-3e61-4566-98b2-f3c280755c5f] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_FAILURE(357), Failed to delete snapshot 'VM_JMeter_Snapshot_42588' for VM 'VM_SnapshotBug_1'. Logs are ovirt log collector and separate tar of relevant vdsms[1] [1] https://drive.google.com/open?id=1Pi-EPyPVwVh6r6ya8ikMCDphgJyhNKB6 Version-Release number of selected component (if applicable): Version 4.3.8.1-0.1.master.el7 How reproducible: reproduces Steps to Reproduce: 1. Create VM with an existing snapshot 2. Take VM snapshot 3. IO activity force extend 4. Wait 5 mins 5. Remove Snapshot Actual results: Failed to live merge. Top volume 22cd84b2-0809-4365-a629-1ac5f928ac7c is still in qemu chain [dd848b17-889d-449e-980f-80caab017565, 22cd84b2-0809-4365-a629-1ac5f928ac7c, acebaabc-a96b-407f-8408-bb59d193910c] 2020-01-28 19:37:30,834-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Command id: '63a21e84-0187-496f-8964-581644da0b2d failed child command status for step 'MERGE_STATUS' Expected results: Successful live merge/vm snapshot removal Additional info: Logs are ovirt log collector and seperate tar of relevant vdsms[1] [1] https://drive.google.com/open?id=1Pi-EPyPVwVh6r6ya8ikMCDphgJyhNKB6
Moved to 4.4.1 not being marked as blocker for 4.4.0 and we are preparing to GA.
Moving to Avihai as this isnt scale issue as discussed offline see comment 10
As bug 1796415 was verified, this bug can be considered as verified too. Moving to VERIFIED.
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.