Bug 1757782

Summary: Deleting a Single snapshot disk live merge 'REDUCE_IMAGE' step fails with NPE and snapshot disk remains in illegal state
Product: [oVirt] ovirt-engine Reporter: Avihai <aefrat>
Component: BLL.StorageAssignee: Pavel Bar <pbar>
Status: CLOSED CURRENTRELEASE QA Contact: Avihai <aefrat>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3.6.6CC: bugs, pbar, pchavva, tnisan
Target Milestone: ovirt-4.3.6Keywords: Automation, Regression
Target Release: 4.3.6.7Flags: pm-rhel: ovirt-4.3+
pm-rhel: blocker?
aefrat: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.6.7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-25 13:44:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Avihai 2019-10-02 12:48:05 UTC
Description of problem:
Deleting a Single snapshot disk live merge 'REDUCE_IMAGE' step fails with NPE:

2019-09-28 05:26:45,766+03 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [disks_delete_2a091e87-5309-442e] Executing Live Me
rge command step 'REDUCE_IMAGE'
2019-09-28 05:26:45,767+03 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [disks_delete_2a091e87-5309-442e] Command 'RemoveSnapshotSingleDiskLive' id: '1c62ebca-d48b-4425-90ce-9b7c62eb592d' with children [00c8f873-b7fc-463a-a5ca-58284d25d1ae, 8b90a0cc-9986-4876-8548-7feec1519dd4, cc4e8979-da15-43a3-878e-43447815c57f, a8b7e922-b8c3-4201-8687-5608eafedaf8] failed when attempting to perform the next operation, marking as 'ACTIVE'
2019-09-28 05:26:45,767+03 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [disks_delete_2a091e87-5309-442e] Command 'RemoveSnapshotSingleDiskLive' id: '1c62ebca-d48b-4425-90ce-9b7c62eb592d' with children [00c8f873-b7fc-463a-a5ca-58284d25d1ae, 8b90a0cc-9986-4876-8548-7feec1519dd4, cc4e8979-da15-43a3-878e-43447815c57f, a8b7e922-b8c3-4201-8687-5608eafedaf8] failed when attempting to perform the next operation, marking as 'ACTIVE'
2019-09-28 05:26:45,767+03 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [disks_delete_2a091e87-5309-442e] null: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.shouldSkipReduceImage(RemoveSnapshotSingleDiskLiveCommand.java:191) [bll.jar:]
        at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.performNextOperation(RemoveSnapshotSingleDiskLiveCommand.java:137) [bll.jar:]
        at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32) [bll.jar:]
        at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:77) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) [bll.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_222]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_222]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_222]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_222]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_222]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]

2019-09-28 05:26:45,770+03 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [disks_delete_2a091e87-5309-442e] Command id: '1c62ebca-d48b-4425-90ce-9b7c62eb592d failed child command status for step 'REDUCE_IMAGE'
2019-09-28 05:26:45,770+03 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [disks_delete_2a091e87-5309-442e] Command 'RemoveSnapshotSingleDiskLive' id: '1c62ebca-d48b-4425-90ce-9b7c62eb592d' child commands '[00c8f873-b7fc-463a-a5ca-58284d25d1ae, 8b90a0cc-9986-4876-8548-7feec1519dd4, cc4e8979-da15-43a3-878e-43447815c57f, a8b7e922-b8c3-4201-8687-5608eafedaf8]' executions were completed, status 'FAILED'
2019-09-28 05:26:46,809+03 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-95) [disks_delete_2a091e87-5309-442e] Merging of snapshot '3dfcbc70-d4e9-4d91-9a3e-0303c511a32c' images 'd1a18cda-2183-4117-8c72-027e517b9297'..'4f0449d7-b9dd-4554-8911-0d9fc235c5b6' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation.
2019-09-28 05:26:46,813+03 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-95) [disks_delete_2a091e87-5309-442e] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand' with failure.
2019-09-28 05:26:46,813+03 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-95) [disks_delete_2a091e87-5309-442e] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[49e99667-1652-4089-9216-de1ef0fd7818=TEMPLATE]'}'
2019-09-28 05:26:46,827+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-95) [disks_delete_2a091e87-5309-442e] Command 'RemoveDiskSnapshots' id: '27f999c0-bb5c-4d18-96c6-4a6b99f84fd5' child commands '[1c62ebca-d48b-4425-90ce-9b7c62eb592d]' executions were completed, status 'FAILED'
2019-09-28 05:26:47,847+03 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveDiskSnapshotsCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-54) [disks_delete_2a091e87-5309-442e] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveDiskSnapshotsCommand' with failure.
2019-09-28 05:26:47,944+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-54) [disks_delete_2a091e87-5309-442e] EVENT_ID: USER_REMOVE_DISK_SNAPSHOT_FINISHED_FAILURE(376), Failed to complete deletion of Disk 'vm_TestCase16287_2805163958_Disk_0' from snapshot(s) 'snap_16287_2805215705' of VM 'vm_TestCase16287_2805163958' (User: admin@internal-authz).

Version-Release number of selected component (if applicable):


How reproducible:
100% automation and manually

Steps to Reproduce:
1. Create VM
2. Add disk to the VM
3. Create snapshot
4. Start VM
5. Try to deleted a single disk on the snapshot via rest:

DELETE request content:
https://hosted-engine-07.lab.eng.tlv2.redhat.com/ovirt-engine/api/storagedomains/50a1418d-e9cc-4eaf-b845-a5916737fc82/disksnapshots/0c935bf8-bc6a-43ef-8fa2-06584b50c1e0

Full automation scenario with timestamps:
Scenario:
05:16:39 2019-09-28 05:16:39,588 INFO       Test Setup   1: Creating VM vm_TestCase16287_2805163958
05:16:39 2019-09-28 05:16:39,802 INFO       Test Setup   2: [class] Description: clone vm from a pre-defined template golden_template_default_rhel8.0 with {'vol_sparse': True, 'name': 'vm_TestCase16287_2805163958', 'clone': False, 'cluster': 'golden_env_mixed_1', 'storagedomain': 'fcp_2', 'display_type': 'spice', 'timeout': 900, 'virtio_scsi': True, 'vol_format': 'cow', 'os_type': 'RHEL6x64', 'type': 'desktop', 'wait': True}
05:16:52 2019-09-28 05:16:52,600 INFO       Test Setup   3: Starting VM vm_TestCase16287_2805163958
05:16:52 2019-09-28 05:16:52,604 INFO       Test Setup   4: [class] Start VM vm_TestCase16287_2805163958 with {'wait_for_ip': True, 'pause': False, 'use_cloud_init': False, 'timeout': 600, 'wait_for_status': 'up'}
05:20:29 2019-09-28 05:20:28,948 INFO       Test Setup   5: Creating disks with filesystem and attach to VM vm_TestCase16287_2805163958
05:21:57 2019-09-28 05:21:57,055 INFO 129: storage/rhevmtests.storage.storage_remove_snapshots.test_live_merge.TestCase16287.test_basic_snapshot_deletion_of_snapshots_disk[fcp]
05:21:57 2019-09-28 05:21:57,056 INFO STORAGE: FCP
05:21:57 2019-09-28 05:21:57,254 INFO       Test Step   6: Stop vm vm_TestCase16287_2805163958 with {'async': 'true'}
05:22:11 2019-09-28 05:22:11,739 INFO       Test Step   7: Adding new snapshot to vm vm_TestCase16287_2805163958
05:22:11 2019-09-28 05:22:11,742 INFO       Test Step   8: Add snapshot to VM vm_TestCase16287_2805163958 with {'description': 'snap_16287_2805215705', 'wait': True}
05:22:43 2019-09-28 05:22:43,765 INFO       Test Step   9: Start VMs with {'vm_list': ['vm_TestCase16287_2805163958'], 'wait_for_ip': True, 'wait_for_status': 'up', 'max_workers': 2}
05:22:43 2019-09-28 05:25:31,765 INFO       Test Step   10: Disk 'vm_TestCase16287_2805163958_Disk_0' from Snapshot(s) 'snap_16287_2805215705' of VM 'vm_TestCase16287_2805163958' deletion was initiated 


Actual results:
deleting a single snapshot disk fails.

Expected results:
deleting a single snapshot disk should work.


Additional info:
Occurred during automation TestCase16287.test_basic_snapshot_deletion_of_snapshots_disk.
Also reproduced manually.

Comment 1 Avihai 2019-10-02 12:53:32 UTC
Regression is marked as this issue did not occur at ovirt-engine-4.3.6.5-0.1.el7.noarch
but only started at ovirt-engine-4.3.6.6

Comment 2 Avihai 2019-10-02 13:12:57 UTC
Also the snapshot disk that was deleted remains in Illegal status.

So If you try to delete the snapshot afterward it will fail with:
VDSM host_mixed_2 command MergeVDS failed: Drive image file could not be found
and
VDSM host_mixed_1 command HSMGetAllTasksStatusesVDS failed: Volume does not exist

Only removing the VM down and trying to delete the disk will also fail as the disk is in Illegal status.

The only workaround to delete this snapshot disk is to move the VM to 'down' and remove the VM+all disks.

Comment 3 RHEL Program Management 2019-10-03 09:06:53 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 4 Avihai 2019-10-08 04:31:55 UTC
Verified on ovirt-engine-4.3.6.7-0.1.el7.noarch.

Check both manual scenario and automation TestCase16287 - both passed.
Deleting a Single snapshot disk on a live VM works again.

Comment 5 Sandro Bonazzola 2019-10-25 13:44:09 UTC
This bugzilla is included in oVirt 4.3.6 async release, published on October 8th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.6 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 7 Pavan Chavva 2020-04-14 13:09:35 UTC
Got it thanks!