Bug 1796124 - Live Merge and Remove Snapshot fails
Summary: Live Merge and Remove Snapshot fails
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.30.33
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.4.5
: 4.40.50.7
Assignee: Nir Soffer
QA Contact: Evelina Shames
URL:
Whiteboard:
Depends On:
Blocks: 1796415
TreeView+ depends on / blocked
 
Reported: 2020-01-29 16:27 UTC by mlehrer
Modified: 2021-11-04 19:28 UTC (History)
11 users (show)

Fixed In Version: vdsm-4.40.50.7
Clone Of:
: 1796415 (view as bug list)
Environment:
Last Closed: 2021-03-18 15:16:12 UTC
oVirt Team: Storage
Embargoed:
abawer: needinfo+
pm-rhel: ovirt-4.4+
lsvaty: testing_plan_complete?
pm-rhel: devel_ack+
pm-rhel: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 106660 0 ovirt-4.3 MERGED vm: name live merge cleanup thread based on job ID 2021-02-20 03:14:36 UTC
oVirt gerrit 111556 0 master MERGED merge_test: Add active live merge test 2021-02-20 03:14:35 UTC
oVirt gerrit 111567 0 master MERGED API: Add teardownVolume to public api 2021-02-20 03:14:36 UTC
oVirt gerrit 111617 0 master MERGED vm: Move live merge code to livemerge module 2021-02-20 03:14:36 UTC
oVirt gerrit 111744 0 master MERGED vmfakelib: Add IRS methods for live merge testing 2021-02-20 03:14:36 UTC
oVirt gerrit 111825 0 master MERGED merge_test: Add internal live merge test 2021-02-20 03:14:36 UTC
oVirt gerrit 111826 0 master MERGED merge_test: Add live merge errors tests 2021-02-20 03:14:36 UTC
oVirt gerrit 111901 0 master MERGED vm: Remove confLock from blockJobs sections 2021-02-20 03:14:36 UTC
oVirt gerrit 111902 0 master MERGED vm: Make private vm methods public for merge usage 2021-02-20 03:14:36 UTC
oVirt gerrit 112294 0 master MERGED merge_test: Add block commit cancellation test 2021-02-20 03:14:36 UTC
oVirt gerrit 112464 0 master MERGED vmfakelib: Update IRS methods for livemerge tests 2021-02-20 03:14:37 UTC
oVirt gerrit 112465 0 master MERGED merge_test: Explicitly import virt.vm objects 2021-02-20 03:14:37 UTC
oVirt gerrit 112829 0 master ABANDONED vm: Modernize volume extend methods names 2021-07-18 01:58:35 UTC
oVirt gerrit 112833 0 master ABANDONED vm: Add dedicated exception to verify volume extension method 2021-03-29 23:20:09 UTC
oVirt gerrit 112836 0 master ABANDONED livemerge: CleanupThread enhancements 2021-03-29 23:21:54 UTC
oVirt gerrit 112837 0 master ABANDONED livemerge: Add extend retry attempts for base volume extension 2021-03-29 23:19:16 UTC
oVirt gerrit 113630 0 master MERGED livemerge: Extend before commit 2021-02-23 19:54:27 UTC
oVirt gerrit 113631 0 master MERGED livemerge: Abort job if extend timed out 2021-02-23 19:54:34 UTC
oVirt gerrit 113632 0 master MERGED livemerge: Fix job persistence 2021-02-23 19:54:40 UTC

Description mlehrer 2020-01-29 16:27:50 UTC
Description of problem:
Attempted the following: 

VM is running and writing to disk, a VM snapshot is performed, disk is extended, then a snapshot is removed, then another snapshot is taken.

During snapshot removal following issues occur:


vdsm.log
2020-01-28 18:22:51,039+0000 INFO  (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') CPU stopped: onSuspend (vm:6062)
2020-01-28 18:22:51,042+0000 WARN  (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') device vda out of space (vm:3997)
2020-01-28 18:22:51,043+0000 INFO  (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') abnormal vm stop device ua-624caba8-2d6d-4596-bf7d-e5ba50b4632c error enospc (vm:5075)
2020-01-28 18:22:51,043+0000 INFO  (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') CPU stopped: onIOError (vm:6062)
2020-01-28 18:22:51,048+0000 INFO  (libvirt/events) [virt.vm] (vmId='de5803b9-d6aa-45a7-bba4-432fe53b7f7e') Requesting extension for volume 3f7b06fe-0d5e-4c5e-8c70-b908f1e18bd2 on domain 7ff38788-4421-4029-b23c-6159f155c9f2 (apparent: 1073741824, capacity: 107374182400, allocated: 1078198272, physical: 1073741824 threshold_state: exceeded) (vm:1288)
2020-01-28 18:22:51,048+0000 INFO  (libvirt/events) [vdsm.api] START sendExtendMsg(spUUID='d8bdc838-93f3-4833-9774-e41ad0881f9e', volDict={'newSize': 2147483648, 'domainID': u'7ff38788-4421-4029-b23c-6159f155c9f2', 'name': 'vda', 'poolID': 'd8bdc838-93f3-4833-9774-e41ad0881f9e', 'clock': <Clock(total=0.00*, extend-volume=0.00*)>, 'internal': False, 'volumeID': u'3f7b06fe-0d5e-4c5e-8c70-b908f1e18bd2', 'imageID': u'624caba8-2d6d-4596-bf7d-e5ba50b4632c'}, newSize=2147483648, callbackFunc=<bound method Vm.__afterVolumeExtension of <vdsm.virt.vm.Vm object at 0x7fc333410110>>) from=internal, task_id=507d3975-b6d5-4674-b26f-f77ed96f2d73 (api:48)


Resulting in:

engine.log
2020-01-28 19:37:28,871-05 WARN  [org.ovirt.engine.core.vdsbroker.libvirt.VmDevicesConverter] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [e11a6f96-3e61-4566-98b2-f3c280755c5f] unmanaged disk with path '/var/run/vdsm/payload/f13ca8ef-d81b-4aaa-bec0-a31f69ca1421.0f6e773a0c690fd566c4718f0d76cdb2.img' is ignored
2020-01-28 19:37:28,878-05 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Failed to live merge. Top volume 22cd84b2-0809-4365-a629-1ac5f928ac7c is still in qemu chain [dd848b17-889d-449e-980f-80caab017565, 22cd84b2-0809-4365-a629-1ac5f928ac7c, acebaabc-a96b-407f-8408-bb59d193910c]
2020-01-28 19:37:30,834-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Command id: '63a21e84-0187-496f-8964-581644da0b2d failed child command status for step 'MERGE_STATUS'
2020-01-28 19:37:30,834-05 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Command 'RemoveSnapshotSingleDiskLive' id: '63a21e84-0187-496f-8964-581644da0b2d' child commands '[e0e206bc-eae3-42b8-a39d-665395f5f539, fe5081a9-8f06-4fe0-8ec6-c707636a2a42, 311507c6-541e-405e-8c95-7507db6df3fa]' executions were completed, status 'FAILED'
2020-01-28 19:37:31,885-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-31) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Merging of snapshot '643a447d-4837-404d-9772-da28b1a544a4' images 'acebaabc-a96b-407f-8408-bb59d193910c'..'22cd84b2-0809-4365-a629-1ac5f928ac7c' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation.
2020-01-28 19:37:31,897-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-31) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand' with failure.
2020-01-28 19:37:31,916-05 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-31) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Command 'RemoveSnapshot' id: '56b00a6c-9260-4d3d-982f-5aab7be4c8be' child commands '[63a21e84-0187-496f-8964-581644da0b2d]' executions were completed, status 'FAILED'
2020-01-28 19:37:33,003-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-68) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand' with failure.
2020-01-28 19:37:33,052-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-68) [e11a6f96-3e61-4566-98b2-f3c280755c5f] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_FAILURE(357), Failed to delete snapshot 'VM_JMeter_Snapshot_42588' for VM 'VM_SnapshotBug_1'.

Logs are ovirt log collector and separate tar of relevant vdsms[1]

[1] https://drive.google.com/open?id=1Pi-EPyPVwVh6r6ya8ikMCDphgJyhNKB6


Version-Release number of selected component (if applicable):
Version 4.3.8.1-0.1.master.el7


How reproducible:
reproduces

Steps to Reproduce:
1. Create VM with an existing snapshot
2. Take VM snapshot
3. IO activity force extend
4. Wait 5 mins
5. Remove Snapshot


Actual results:

Failed to live merge. Top volume 22cd84b2-0809-4365-a629-1ac5f928ac7c is still in qemu chain [dd848b17-889d-449e-980f-80caab017565, 22cd84b2-0809-4365-a629-1ac5f928ac7c, acebaabc-a96b-407f-8408-bb59d193910c]
2020-01-28 19:37:30,834-05 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [e11a6f96-3e61-4566-98b2-f3c280755c5f] Command id: '63a21e84-0187-496f-8964-581644da0b2d failed child command status for step 'MERGE_STATUS'

Expected results:
Successful live merge/vm snapshot removal


Additional info:
Logs are ovirt log collector and seperate tar of relevant vdsms[1]

[1] https://drive.google.com/open?id=1Pi-EPyPVwVh6r6ya8ikMCDphgJyhNKB6

Comment 3 Sandro Bonazzola 2020-05-18 14:46:57 UTC
Moved to 4.4.1 not being marked as blocker for 4.4.0 and we are preparing to GA.

Comment 13 mlehrer 2020-11-29 09:29:00 UTC
Moving to Avihai as this isnt scale issue as discussed offline see comment 10

Comment 24 Evelina Shames 2021-03-17 09:36:41 UTC
As bug 1796415 was verified, this bug can be considered as verified too.

Moving to VERIFIED.

Comment 25 Sandro Bonazzola 2021-03-18 15:16:12 UTC
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.