Bug 1757782 - Deleting a Single snapshot disk live merge 'REDUCE_IMAGE' step fails with NPE and snapshot disk remains in illegal state
Summary: Deleting a Single snapshot disk live merge 'REDUCE_IMAGE' step fails with NPE...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.3.6.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.3.6
: 4.3.6.7
Assignee: Pavel Bar
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-02 12:48 UTC by Avihai
Modified: 2020-04-14 13:09 UTC (History)
4 users (show)

Fixed In Version: ovirt-engine-4.3.6.7
Clone Of:
Environment:
Last Closed: 2019-10-25 13:44:09 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.3+
pm-rhel: blocker?
aefrat: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 103821 0 'None' MERGED core: Fixing NPE when removing snapshot disk via REST API 2020-10-19 04:56:37 UTC
oVirt gerrit 103847 0 'None' MERGED core: Fixing NPE when removing snapshot disk via REST API 2020-10-19 04:56:50 UTC

Description Avihai 2019-10-02 12:48:05 UTC
Description of problem:
Deleting a Single snapshot disk live merge 'REDUCE_IMAGE' step fails with NPE:

2019-09-28 05:26:45,766+03 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [disks_delete_2a091e87-5309-442e] Executing Live Me
rge command step 'REDUCE_IMAGE'
2019-09-28 05:26:45,767+03 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [disks_delete_2a091e87-5309-442e] Command 'RemoveSnapshotSingleDiskLive' id: '1c62ebca-d48b-4425-90ce-9b7c62eb592d' with children [00c8f873-b7fc-463a-a5ca-58284d25d1ae, 8b90a0cc-9986-4876-8548-7feec1519dd4, cc4e8979-da15-43a3-878e-43447815c57f, a8b7e922-b8c3-4201-8687-5608eafedaf8] failed when attempting to perform the next operation, marking as 'ACTIVE'
2019-09-28 05:26:45,767+03 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [disks_delete_2a091e87-5309-442e] Command 'RemoveSnapshotSingleDiskLive' id: '1c62ebca-d48b-4425-90ce-9b7c62eb592d' with children [00c8f873-b7fc-463a-a5ca-58284d25d1ae, 8b90a0cc-9986-4876-8548-7feec1519dd4, cc4e8979-da15-43a3-878e-43447815c57f, a8b7e922-b8c3-4201-8687-5608eafedaf8] failed when attempting to perform the next operation, marking as 'ACTIVE'
2019-09-28 05:26:45,767+03 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [disks_delete_2a091e87-5309-442e] null: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.shouldSkipReduceImage(RemoveSnapshotSingleDiskLiveCommand.java:191) [bll.jar:]
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.performNextOperation(RemoveSnapshotSingleDiskLiveCommand.java:137) [bll.jar:]
        at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32) [bll.jar:]
        at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:77) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) [bll.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_222]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_222]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_222]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_222]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_222]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]

2019-09-28 05:26:45,770+03 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [disks_delete_2a091e87-5309-442e] Command id: '1c62ebca-d48b-4425-90ce-9b7c62eb592d failed child command status for step 'REDUCE_IMAGE'
2019-09-28 05:26:45,770+03 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [disks_delete_2a091e87-5309-442e] Command 'RemoveSnapshotSingleDiskLive' id: '1c62ebca-d48b-4425-90ce-9b7c62eb592d' child commands '[00c8f873-b7fc-463a-a5ca-58284d25d1ae, 8b90a0cc-9986-4876-8548-7feec1519dd4, cc4e8979-da15-43a3-878e-43447815c57f, a8b7e922-b8c3-4201-8687-5608eafedaf8]' executions were completed, status 'FAILED'
2019-09-28 05:26:46,809+03 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-95) [disks_delete_2a091e87-5309-442e] Merging of snapshot '3dfcbc70-d4e9-4d91-9a3e-0303c511a32c' images 'd1a18cda-2183-4117-8c72-027e517b9297'..'4f0449d7-b9dd-4554-8911-0d9fc235c5b6' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation.
2019-09-28 05:26:46,813+03 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-95) [disks_delete_2a091e87-5309-442e] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand' with failure.
2019-09-28 05:26:46,813+03 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-95) [disks_delete_2a091e87-5309-442e] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[49e99667-1652-4089-9216-de1ef0fd7818=TEMPLATE]'}'
2019-09-28 05:26:46,827+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-95) [disks_delete_2a091e87-5309-442e] Command 'RemoveDiskSnapshots' id: '27f999c0-bb5c-4d18-96c6-4a6b99f84fd5' child commands '[1c62ebca-d48b-4425-90ce-9b7c62eb592d]' executions were completed, status 'FAILED'
2019-09-28 05:26:47,847+03 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveDiskSnapshotsCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-54) [disks_delete_2a091e87-5309-442e] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveDiskSnapshotsCommand' with failure.
2019-09-28 05:26:47,944+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-54) [disks_delete_2a091e87-5309-442e] EVENT_ID: USER_REMOVE_DISK_SNAPSHOT_FINISHED_FAILURE(376), Failed to complete deletion of Disk 'vm_TestCase16287_2805163958_Disk_0' from snapshot(s) 'snap_16287_2805215705' of VM 'vm_TestCase16287_2805163958' (User: admin@internal-authz).

Version-Release number of selected component (if applicable):


How reproducible:
100% automation and manually

Steps to Reproduce:
1. Create VM
2. Add disk to the VM
3. Create snapshot
4. Start VM
5. Try to deleted a single disk on the snapshot via rest:

DELETE request content:
https://hosted-engine-07.lab.eng.tlv2.redhat.com/ovirt-engine/api/storagedomains/50a1418d-e9cc-4eaf-b845-a5916737fc82/disksnapshots/0c935bf8-bc6a-43ef-8fa2-06584b50c1e0

Full automation scenario with timestamps:
Scenario:
05:16:39 2019-09-28 05:16:39,588 INFO       Test Setup   1: Creating VM vm_TestCase16287_2805163958
05:16:39 2019-09-28 05:16:39,802 INFO       Test Setup   2: [class] Description: clone vm from a pre-defined template golden_template_default_rhel8.0 with {'vol_sparse': True, 'name': 'vm_TestCase16287_2805163958', 'clone': False, 'cluster': 'golden_env_mixed_1', 'storagedomain': 'fcp_2', 'display_type': 'spice', 'timeout': 900, 'virtio_scsi': True, 'vol_format': 'cow', 'os_type': 'RHEL6x64', 'type': 'desktop', 'wait': True}
05:16:52 2019-09-28 05:16:52,600 INFO       Test Setup   3: Starting VM vm_TestCase16287_2805163958
05:16:52 2019-09-28 05:16:52,604 INFO       Test Setup   4: [class] Start VM vm_TestCase16287_2805163958 with {'wait_for_ip': True, 'pause': False, 'use_cloud_init': False, 'timeout': 600, 'wait_for_status': 'up'}
05:20:29 2019-09-28 05:20:28,948 INFO       Test Setup   5: Creating disks with filesystem and attach to VM vm_TestCase16287_2805163958
05:21:57 2019-09-28 05:21:57,055 INFO 129: storage/rhevmtests.storage.storage_remove_snapshots.test_live_merge.TestCase16287.test_basic_snapshot_deletion_of_snapshots_disk[fcp]
05:21:57 2019-09-28 05:21:57,056 INFO STORAGE: FCP
05:21:57 2019-09-28 05:21:57,254 INFO       Test Step   6: Stop vm vm_TestCase16287_2805163958 with {'async': 'true'}
05:22:11 2019-09-28 05:22:11,739 INFO       Test Step   7: Adding new snapshot to vm vm_TestCase16287_2805163958
05:22:11 2019-09-28 05:22:11,742 INFO       Test Step   8: Add snapshot to VM vm_TestCase16287_2805163958 with {'description': 'snap_16287_2805215705', 'wait': True}
05:22:43 2019-09-28 05:22:43,765 INFO       Test Step   9: Start VMs with {'vm_list': ['vm_TestCase16287_2805163958'], 'wait_for_ip': True, 'wait_for_status': 'up', 'max_workers': 2}
05:22:43 2019-09-28 05:25:31,765 INFO       Test Step   10: Disk 'vm_TestCase16287_2805163958_Disk_0' from Snapshot(s) 'snap_16287_2805215705' of VM 'vm_TestCase16287_2805163958' deletion was initiated 


Actual results:
deleting a single snapshot disk fails.

Expected results:
deleting a single snapshot disk should work.


Additional info:
Occurred during automation TestCase16287.test_basic_snapshot_deletion_of_snapshots_disk.
Also reproduced manually.

Comment 1 Avihai 2019-10-02 12:53:32 UTC
Regression is marked as this issue did not occur at ovirt-engine-4.3.6.5-0.1.el7.noarch
but only started at ovirt-engine-4.3.6.6

Comment 2 Avihai 2019-10-02 13:12:57 UTC
Also the snapshot disk that was deleted remains in Illegal status.

So If you try to delete the snapshot afterward it will fail with:
VDSM host_mixed_2 command MergeVDS failed: Drive image file could not be found
and
VDSM host_mixed_1 command HSMGetAllTasksStatusesVDS failed: Volume does not exist

Only removing the VM down and trying to delete the disk will also fail as the disk is in Illegal status.

The only workaround to delete this snapshot disk is to move the VM to 'down' and remove the VM+all disks.

Comment 3 RHEL Program Management 2019-10-03 09:06:53 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 4 Avihai 2019-10-08 04:31:55 UTC
Verified on ovirt-engine-4.3.6.7-0.1.el7.noarch.

Check both manual scenario and automation TestCase16287 - both passed.
Deleting a Single snapshot disk on a live VM works again.

Comment 5 Sandro Bonazzola 2019-10-25 13:44:09 UTC
This bugzilla is included in oVirt 4.3.6 async release, published on October 8th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.6 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 7 Pavan Chavva 2020-04-14 13:09:35 UTC
Got it thanks!


Note You need to log in before you can comment on or make changes to this bug.