Bug 1727025 - NPE in DestroyImage endAction during live merge leaving a task in DB for hours causing operations depending on host clean tasks to fail as Deactivate host/StopSPM/deactivate SD
Summary: NPE in DestroyImage endAction during live merge leaving a task in DB for hour...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.3.5
Hardware: All
OS: Linux
high
high
Target Milestone: ovirt-4.4.0
: ---
Assignee: Fedor Gavrilov
QA Contact: Ilan Zuckerman
URL:
Whiteboard:
: 1733931 (view as bug list)
Depends On:
Blocks: 1748387
TreeView+ depends on / blocked
 
Reported: 2019-07-04 09:43 UTC by Fedor Gavrilov
Modified: 2023-03-24 15:02 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1748387 (view as bug list)
Environment:
Last Closed: 2020-08-04 13:19:49 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)
engine and vdsm logs 4.3.5-6(ovirt engine 4.3.5.4) (5.90 MB, application/gzip)
2019-07-14 10:01 UTC, Avihai
no flags Details
snapshot_merge_failure logs (1.23 MB, application/gzip)
2019-12-22 13:34 UTC, Yosi Ben Shimon
no flags Details
verification engine + vdsm logs (1.62 MB, application/zip)
2020-03-09 08:41 UTC, Ilan Zuckerman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1527861 0 unspecified CLOSED Live merge with continuous I/O to the VM fails with 'VolumeImageHasChildren: Cannot delete volume which has children' 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2020:3247 0 None None None 2020-08-04 13:20:17 UTC
oVirt gerrit 102229 0 'None' MERGED core: make RemoveSnapshotSingleDisk wait for DestroyImage execution 2021-02-01 13:02:21 UTC

Description Fedor Gavrilov 2019-07-04 09:43:01 UTC
Description of problem:
Live merge during continuous IO seems to succeed, however NPE can be seen in the logs:
ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-1480) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
	at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:358) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:348) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:150) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:103) [bll.jar:]
	at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_191]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_191]
	at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_191]
	at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:]
	at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)


Version-Release number of selected component (if applicable):
ovirt-engine: 4.4.0-SNAPSHOT

How reproducible:
NFS storage

Steps to Reproduce:
1. Create VM with disk + OS
2. Add 4 disks to VM
3. Create live snapshot
4. Perform dd from the OS disk to one of the other disks
5. Live merge the snapshot

Actual results:
Live merge succeeds, but there are exceptions in engine log

Expected results:
No exceptions

Additional info:

Comment 1 Avihai 2019-07-14 09:53:18 UTC
I see the same issue at 4.3.5-6 (ovirt-engine 4.3.5.4/vdsm-4.30.24-2.el7ev.x86_64) in move live disks operations(no IO is done) on FCP SD.

Task to remove host/deactivate host/StopSPM get stuck for hours and operations that depend on clean tasks (as StopSPM/move host to maintenance etc) fails failing many automation TC's -> raising priority.

The tests fails as SPM is not stoped as a task is still running(with status 'finished').

Not stopping SPM on vds host_mixed_2, pool id 43238d81-6835-4217-8b81-951b564655d2 as there are uncleared tasks Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished'

Task for moving live disk(TestCase11619) was finished but not removed from DB for hours from 2019-07-11 22:55 until 2019-07-12 06:29:44.

Issue occured here:
2019-07-11 22:55:28,516+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-44) [disks_syncAction_f770e3d9-8595-41d3] EVENT_ID: USER_MOVED_DISK_FINISHED_SUCCESS(2,010), User admin@internal-authz finished moving disk vm_thin_disk_image_11619_Disk1 to domain fcp_2.
2019-07-11 22:55:31,893+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] Polling and updating Async Tasks: 1 tasks, 1 tasks to poll now
2019-07-11 22:55:31,899+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] SPMAsyncTask::PollTask: Polling task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
2019-07-11 22:55:31,899+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] BaseAsyncTask::onTaskEndSuccess: Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2019-07-11 22:55:31,900+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] CommandAsyncTask::endActionIfNecessary: All tasks of command '4d1745ef-44af-4e7d-abb9-67edd1748a58' has ended -> executing 'endAction'
2019-07-11 22:55:31,900+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] CommandAsyncTask::endAction: Ending action for '1' tasks (command ID: '4d1745ef-44af-4e7d-abb9-67edd1748a58'): calling endAction '.
2019-07-11 22:55:31,901+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-3678) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage',
2019-07-11 22:55:31,901+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-3678) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:358) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:348) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:150) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:103) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_222-ea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_222-ea]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_222-ea]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_222-ea]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_222-ea]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]

2019-07-11 22:55:31,902+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-3678) [] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable RuntimeException the task will be cleared.
2019-07-11 22:55:31,908+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-3678) [] BaseAsyncTask::removeTaskFromDB: Removed task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' from DataBase


22:55:35 2019-07-11 22:55:34,943 INFO       Test Step   9: Moving disk a1b9bda5-243e-4358-a98a-a398df1d507a to storage domain fcp_2


[root@hosted-engine-06 ovirt-engine]# zgrep '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' engine.log*
engine.log-20190712.gz:2019-07-11 22:55:22,740+03 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (EE-ManagedThreadFactory-commandCoordinator-Thread-5) [disks_syncAction_f770e3d9-8595-41d3] CommandMultiAsyncTasks::attachTask: Attaching task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' to command '4d1745ef-44af-4e7d-abb9-67edd1748a58'.
engine.log-20190712.gz:2019-07-11 22:55:22,757+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-commandCoordinator-Thread-5) [disks_syncAction_f770e3d9-8595-41d3] Adding task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
engine.log-20190712.gz:2019-07-11 22:55:22,774+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-commandCoordinator-Thread-5) [disks_syncAction_f770e3d9-8595-41d3] BaseAsyncTask::startPollingTask: Starting to poll task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7'.
engine.log-20190712.gz:2019-07-11 22:55:22,774+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-commandCoordinator-Thread-5) [disks_syncAction_f770e3d9-8595-41d3] BaseAsyncTask::startPollingTask: Starting to poll task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7'.
engine.log-20190712.gz:2019-07-11 22:55:31,899+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] SPMAsyncTask::PollTask: Polling task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
engine.log-20190712.gz:2019-07-11 22:55:31,899+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] BaseAsyncTask::onTaskEndSuccess: Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
engine.log-20190712.gz:2019-07-11 22:55:31,908+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-3678) [] BaseAsyncTask::removeTaskFromDB: Removed task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' from DataBase


engine.log-20190712.gz:2019-07-12 01:52:40,445+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-21) [hosts_syncAction_81f17357-1567-4103] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '43238d81-6835-4217-8b81-951b564655d2' as there are uncleared tasks 'Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished''
engine.log-20190712.gz:2019-07-12 01:52:40,453+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-21) [hosts_syncAction_81f17357-1567-4103] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds host_mixed_2, pool id 43238d81-6835-4217-8b81-951b564655d2 as there are uncleared tasks Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished'
engine.log-20190712.gz:2019-07-12 01:52:45,859+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-21) [hosts_syncAction_2019249e-285b-459e] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '43238d81-6835-4217-8b81-951b564655d2' as there are uncleared tasks 'Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished''
engine.log-20190712.gz:2019-07-12 01:52:45,867+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-21) [hosts_syncAction_2019249e-285b-459e] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds host_mixed_2, pool id 43238d81-6835-4217-8b81-951b564655d2 as there are uncleared tasks Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished'
engine.log-20190712.gz:2019-07-12 01:53:08,156+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-21) [hosts_syncAction_abb918c3-1807-405a] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '43238d81-6835-4217-8b81-951b564655d2' as there are uncleared tasks 'Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished''
engine.log-20190712.gz:2019-07-12 01:53:08,163+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-21) [hosts_syncAction_abb918c3-1807-405a] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds host_mixed_2, pool id 43238d81-6835-4217-8b81-951b564655d2 as there are uncleared tasks Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished'
engine.log-20190713.gz:2019-07-12 06:28:37,917+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engine-Thread-11) [44c69927] Adding task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
engine.log-20190713.gz:2019-07-12 06:28:38,215+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-11) [44c69927] BaseAsyncTask::startPollingTask: Starting to poll task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7'.
engine.log-20190713.gz:2019-07-12 06:28:44,463+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:28:54,466+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-49) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:04,468+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:14,471+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-94) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:24,472+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-38) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:34,474+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-97) [44c69927] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:44,476+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:44,476+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:44,863+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:44,863+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] SPMAsyncTask::PollTask: Polling task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
engine.log-20190713.gz:2019-07-12 06:29:44,863+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] BaseAsyncTask::onTaskEndSuccess: Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
engine.log-20190713.gz:2019-07-12 06:29:44,863+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] SPMAsyncTask::ClearAsyncTask: Attempting to clear task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7'
engine.log-20190713.gz:2019-07-12 06:29:44,864+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] START, SPMClearTaskVDSCommand( SPMTaskGuidBaseVDSCommandParameters:{storagePoolId='43238d81-6835-4217-8b81-951b564655d2', ignoreFailoverLimit='false', taskId='5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7'}), log id: 5e495ff3
engine.log-20190713.gz:2019-07-12 06:29:44,865+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] START, HSMClearTaskVDSCommand(HostName = host_mixed_2, HSMTaskGuidBaseVDSCommandParameters:{hostId='f31df623-c3c9-420f-a78d-a3a9a4114b42', taskId='5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7'}), log id: c6f0343
engine.log-20190713.gz:2019-07-12 06:29:44,911+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] BaseAsyncTask::removeTaskFromDB: Removed task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' from DataBase

Scenario and time stamps:
22:51:03 2019-07-11 22:51:03,714 INFO       Test Setup   1: [class] Create new vm with nic nic1, disk and OS with {'diskActive': True, 'wipe_after_delete': 'false', 'slim': False, 'cluster': 'golden_env_mixed_1', 'linked': 'true', 'network': 'ovirtmgmt', 'provisioned_size': 6442450944, 'cpu_socket': 1, 'nicType': 'virtio', 'cpu_cores': 1, 'plugged': 'true', 'start': 'false', 'vmDescription': '', 'memory': 1073741824, 'vmName': 'vm_thin_disk_image_11619', 'template': 'Blank', 'volumeFormat': 'cow', 'display_type': 'spice', 'installation': False, 'storageDomainName': 'fcp_1', 'attempt': 60, 'diskInterface': 'virtio', 'bootable': True, 'type': 'desktop', 'copy_permissions': False, 'interval': 60, 'volumeType': True, 'diskType': 'data', 'useAgent': True, 'async': False, 'os_type': 'RHEL6x64'}
22:51:03 2019-07-11 22:51:03,718 INFO       Test Setup   2: [class] Description vm_thin_disk_image_11619: add new vm (without starting it) with {'name': 'vm_thin_disk_image_11619', 'cpu_socket': 1, 'copy_permissions': False, 'cpu_cores': 1, 'cluster': 'golden_env_mixed_1', 'display_type': 'spice', 'template': 'Blank', 'memory': 1073741824, 'async': False, 'os_type': 'RHEL6x64', 'type': 'desktop', 'wait': True}
22:51:04 2019-07-11 22:51:04,356 INFO       Test Setup   3: [class] Add vNIC name nic1 to VM vm_thin_disk_image_11619 with {'interface': 'virtio', 'vnic_profile': 'ovirtmgmt', 'linked': 'true', 'network': 'ovirtmgmt', 'plugged': 'true'}
22:51:26 2019-07-11 22:51:25,927 INFO       Test Setup   4: [class] Create new vm with nic nic1, disk and OS with {'diskActive': True, 'wipe_after_delete': 'false', 'slim': False, 'cluster': 'golden_env_mixed_1', 'linked': 'true', 'network': 'ovirtmgmt', 'provisioned_size': 6442450944, 'cpu_socket': 1, 'nicType': 'virtio', 'cpu_cores': 1, 'plugged': 'true', 'start': 'false', 'vmDescription': '', 'memory': 1073741824, 'vmName': 'vm_prealloc_disk_image_11619', 'template': 'Blank', 'volumeFormat': 'raw', 'display_type': 'spice', 'installation': False, 'storageDomainName': 'fcp_1', 'attempt': 60, 'diskInterface': 'virtio', 'bootable': True, 'type': 'desktop', 'copy_permissions': False, 'interval': 60, 'volumeType': False, 'diskType': 'data', 'useAgent': True, 'async': False, 'os_type': 'RHEL6x64'}
22:51:26 2019-07-11 22:51:25,931 INFO       Test Setup   5: [class] Description vm_prealloc_disk_image_11619: add new vm (without starting it) with {'name': 'vm_prealloc_disk_image_11619', 'cpu_socket': 1, 'copy_permissions': False, 'cpu_cores': 1, 'cluster': 'golden_env_mixed_1', 'display_type': 'spice', 'template': 'Blank', 'memory': 1073741824, 'async': False, 'os_type': 'RHEL6x64', 'type': 'desktop', 'wait': True}
22:51:26 2019-07-11 22:51:26,661 INFO       Test Setup   6: [class] Add vNIC name nic1 to VM vm_prealloc_disk_image_11619 with {'interface': 'virtio', 'vnic_profile': 'ovirtmgmt', 'linked': 'true', 'network': 'ovirtmgmt', 'plugged': 'true'}
22:51:38 2019-07-11 22:51:38,918 INFO 014: storage/rhevmtests.storage.storage_disk_image_format.test_disk_image_format.TestCase11619.test_live_move_disk[fcp]
22:51:38 2019-07-11 22:51:38,918 INFO     Start a live disk migration
22:51:39 2019-07-11 22:51:38,918 INFO     * Thin provisioned disk should remain the same
22:51:39 2019-07-11 22:51:38,918 INFO     * Preallocated disk should change to thin provisioned
22:51:39 2019-07-11 22:51:38,918 INFO STORAGE: FCP
22:51:39 2019-07-11 22:51:38,921 INFO       Test Step   7: Start VMs with {'max_workers': 2, 'wait_for_ip': False, 'wait_for_status': 'up', 'vm_list': ['vm_prealloc_disk_image_11619', 'vm_thin_disk_image_11619']}
22:53:00 2019-07-11 22:53:00,305 INFO       Test Step   8: Moving disk c09c2b16-f47f-4624-abcb-153da8076fbd to storage domain fcp_2
22:55:35 2019-07-11 22:55:34,943 INFO       Test Step   9: Moving disk a1b9bda5-243e-4358-a98a-a398df1d507a to storage domain fcp_2

Comment 2 RHEL Program Management 2019-07-14 09:53:21 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 3 Avihai 2019-07-14 10:01:47 UTC
Created attachment 1590489 [details]
engine and vdsm logs 4.3.5-6(ovirt engine 4.3.5.4)

Comment 4 Avihai 2019-07-14 12:05:26 UTC
Impact:
Cannot deactivate SD as host as active tasks
Cannot move the host to maintenance/StopSPM as host as active tasks.

Another hit of the same issue on Tier3 runs(a different setup than the previous failure)

Issue fount at Engine one more time this time a SD can not be deactivated as the host/SPM had unclear tasks due to the following leaving leftover causing creation of SD on the same LUN to fail.

Task started at 2019-07-13 11:35:11 and cleaned from DB at 2019-07-13 14:45:09 ( ~3H later!)  : 


2019-07-13 11:35:17,493+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-34495) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:358) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:348) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:150) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:103) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_222-ea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_222-ea]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_222-ea]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_222-ea]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_222-ea]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]

Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9')  flow in engine log:

[root@hosted-engine-04 ovirt-engine]# zgrep '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' engine.log*

Started at 2019-07-13 11:35:11 and cleaned from DB at 2019-07-13 14:45:09 ( ~3H later!)  : 
engine.log-20190714.gz:2019-07-13 11:35:11,009+03 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [snapshots_delete_350a15e1-140b-40c8] CommandMultiAsyncTasks::attachTask: Attaching task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' to command '1117605c-a708-4f79-9b7e-a2bc36269a0c'.
engine.log-20190714.gz:2019-07-13 11:35:11,018+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [snapshots_delete_350a15e1-140b-40c8] Adding task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
engine.log-20190714.gz:2019-07-13 11:35:11,025+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [snapshots_delete_350a15e1-140b-40c8] BaseAsyncTask::startPollingTask: Starting to poll task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9'.
engine.log-20190714.gz:2019-07-13 11:35:11,025+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [snapshots_delete_350a15e1-140b-40c8] BaseAsyncTask::startPollingTask: Starting to poll task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9'.
engine.log-20190714.gz:2019-07-13 11:35:17,491+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [] SPMAsyncTask::PollTask: Polling task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
engine.log-20190714.gz:2019-07-13 11:35:17,491+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [] BaseAsyncTask::onTaskEndSuccess: Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
engine.log-20190714.gz:2019-07-13 11:35:17,515+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-34495) [] BaseAsyncTask::removeTaskFromDB: Removed task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' from DataBase
engine.log-20190714.gz:2019-07-13 11:36:03,592+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-98) [41f555ad] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_1', pool id '449cbed6-2319-4cc4-8c6e-8eea85bfc222' as there are uncleared tasks 'Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9', status 'finished''
engine.log-20190714.gz:2019-07-13 11:36:03,596+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-98) [41f555ad] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds host_mixed_1, pool id 449cbed6-2319-4cc4-8c6e-8eea85bfc222 as there are uncleared tasks Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9', status 'finished'
engine.log-20190714.gz:2019-07-13 11:36:06,428+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-engine-Thread-34526) [e493569] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_1', pool id '449cbed6-2319-4cc4-8c6e-8eea85bfc222' as there are uncleared tasks 'Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9', status 'finished''
engine.log-20190714.gz:2019-07-13 11:36:06,433+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-34526) [e493569] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds host_mixed_1, pool id 449cbed6-2319-4cc4-8c6e-8eea85bfc222 as there are uncleared tasks Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9', status 'finished'
engine.log-20190714.gz:2019-07-13 14:44:01,657+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engine-Thread-10) [6c60c302] Adding task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
engine.log-20190714.gz:2019-07-13 14:44:01,772+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-10) [6c60c302] BaseAsyncTask::startPollingTask: Starting to poll task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9'.
engine.log-20190714.gz:2019-07-13 14:44:07,658+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-36) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:44:17,664+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:44:27,668+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-72) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:44:37,673+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-46) [452a4857] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:44:47,677+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-3) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:44:57,681+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-21) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:45:07,685+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:45:07,689+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:45:09,194+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:45:09,194+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] SPMAsyncTask::PollTask: Polling task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
engine.log-20190714.gz:2019-07-13 14:45:09,194+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] BaseAsyncTask::onTaskEndSuccess: Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
engine.log-20190714.gz:2019-07-13 14:45:09,194+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] SPMAsyncTask::ClearAsyncTask: Attempting to clear task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9'
engine.log-20190714.gz:2019-07-13 14:45:09,195+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] START, SPMClearTaskVDSCommand( SPMTaskGuidBaseVDSCommandParameters:{storagePoolId='449cbed6-2319-4cc4-8c6e-8eea85bfc222', ignoreFailoverLimit='false', taskId='4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9'}), log id: 614994cf
engine.log-20190714.gz:2019-07-13 14:45:09,196+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] START, HSMClearTaskVDSCommand(HostName = host_mixed_1, HSMTaskGuidBaseVDSCommandParameters:{hostId='6b0e3e6a-f290-4fa0-ba07-de8da47f48ae', taskId='4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9'}), log id: 649fbebb
engine.log-20190714.gz:2019-07-13 14:45:09,231+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] BaseAsyncTask::removeTaskFromDB: Removed task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' from DataBase

Comment 5 Avihai 2019-07-14 12:08:57 UTC
Adding TC scenario and timestamps:

11:30:30 2019-07-13 11:30:30,116 INFO       Test Setup   1: Create data-center dc_type_1311303006 with cluster cl_type_1311303006 and host host_mixed_1
11:30:30 2019-07-13 11:30:30,116 INFO       Test Step   2: Add data-center dc_type_1311303006
11:30:30 2019-07-13 11:30:30,243 INFO       Test Step   3: Add cluster cl_type_1311303006
11:30:30 2019-07-13 11:30:30,246 INFO       Test Setup   4: [class] Add cluster with {'version': '4.2', 'data_center': 'dc_type_1311303006', 'cpu': 'Intel Westmere Family', 'name': 'cl_type_1311303006'}
11:30:30 2019-07-13 11:30:30,448 INFO       Test Step   5: Move host host_mixed_1 to cluster cl_type_1311303006
11:30:30 2019-07-13 11:30:30,451 INFO       Test Setup   6: [class] Switch host host_mixed_1 to different cluster cl_type_1311303006 with {'activate': True}
11:30:30 2019-07-13 11:30:30,503 INFO       Test Setup   7: [class] Deactivate the host host_mixed_1. with {'expected_status': 'maintenance'}
11:30:41 2019-07-13 11:30:40,943 INFO       Test Setup   8: [class] Update properties of host host_mixed_1 (provided in parameters) with {'cluster': 'cl_type_1311303006'}
11:30:43 2019-07-13 11:30:43,238 INFO       Test Setup   9: [class] Activate host host_mixed_1 (set status to UP) with {'wait': True}
11:30:53 2019-07-13 11:30:53,443 INFO       Test Setup  10: Move host host_mixed_1 into the newly created cluster
11:30:53 2019-07-13 11:30:53,445 INFO       Test Setup  11: [class] Switch host host_mixed_1 to different cluster cl_type_1311303006 with {'activate': True}
11:30:53 2019-07-13 11:30:53,499 INFO       Test Setup  12: [class] Deactivate the host host_mixed_1. with {'expected_status': 'maintenance'}
11:31:04 2019-07-13 11:31:03,958 INFO       Test Setup  13: [class] Update properties of host host_mixed_1 (provided in parameters) with {'cluster': 'cl_type_1311303006'}
11:31:04 2019-07-13 11:31:04,244 INFO       Test Setup  14: [class] Activate host host_mixed_1 (set status to UP) with {'wait': True}
11:31:14 2019-07-13 11:31:14,473 INFO       Test Setup  15: Create new storage domain sd_TestCase26087_1311311447
11:31:14 2019-07-13 11:31:14,476 INFO       Test Setup  16: [class] Add storage domain to given data-center with {'index': 0, 'storage_type': 'iscsi', 'storage_domain': 'sd_TestCase26087_1311311447', 'data_center': 'dc_type_1311303006'}
11:32:32 2019-07-13 11:32:32,753 INFO       Test Setup  17: Creating VM vm_TestCase26087_1311323275
11:32:32 2019-07-13 11:32:32,837 INFO       Test Setup  18: [class] Create new vm with nic nic1, disk and OS with {'diskActive': True, 'wipe_after_delete': 'false', 'slim': False, 'cluster': 'cl_type_1311303006', 'linked': 'true', 'network': 'ovirtmgmt', 'provisioned_size': 6442450944, 'cpu_socket': 1, 'nicType': 'virtio', 'cpu_cores': 1, 'plugged': 'true', 'start': 'false', 'vmDescription': '', 'memory': 1073741824, 'vmName': 'vm_TestCase26087_1311323275', 'template': 'Blank', 'volumeFormat': 'cow', 'display_type': 'spice', 'installation': False, 'storageDomainName': 'sd_TestCase26087_1311311447', 'attempt': 60, 'diskInterface': 'virtio', 'bootable': True, 'type': 'desktop', 'copy_permissions': False, 'interval': 60, 'volumeType': True, 'diskType': 'data', 'useAgent': True, 'async': False, 'os_type': 'RHEL6x64'}
11:32:32 2019-07-13 11:32:32,840 INFO       Test Setup  19: [class] Description vm_TestCase26087_1311323275: add new vm (without starting it) with {'name': 'vm_TestCase26087_1311323275', 'cpu_socket': 1, 'copy_permissions': False, 'cpu_cores': 1, 'cluster': 'cl_type_1311303006', 'display_type': 'spice', 'template': 'Blank', 'memory': 1073741824, 'async': False, 'os_type': 'RHEL6x64', 'type': 'desktop', 'wait': True}
11:32:33 2019-07-13 11:32:33,512 INFO       Test Setup  20: [class] Add vNIC name nic1 to VM vm_TestCase26087_1311323275 with {'interface': 'virtio', 'vnic_profile': 'ovirtmgmt', 'linked': 'true', 'network': 'ovirtmgmt', 'plugged': 'true'}
11:32:45 2019-07-13 11:32:45,023 INFO 191: storage/rhevmtests.storage.storage_remove_snapshots.test_live_merge.TestCase26087.test_merge_after_cluster_upgrade[iscsi]
11:32:45 2019-07-13 11:32:45,023 INFO     Live merge after cluster upgrade from 4.2 to latest version
11:32:45 2019-07-13 11:32:45,024 INFO STORAGE: ISCSI
11:32:45 2019-07-13 11:32:45,024 INFO       Test Setup  21: Starting VM vm_TestCase26087_1311323275 with cloud-init as run once
11:32:45 2019-07-13 11:32:45,026 INFO       Test Step  22: Run VM vm_TestCase26087_1311323275 once with {'use_cloud_init': True, 'pause': False, 'wait_for_state': 'up', 'use_sysprep': False}
11:34:13 2019-07-13 11:34:06,098 INFO       Test Setup  23: Upgrading cluster cl_type_1311303006 from version 4.2 to version 4.3
11:34:13 2019-07-13 11:34:06,102 INFO       Test Step  24: Update cluster cl_type_1311303006 with {'version': '4.3'}
11:34:14 2019-07-13 11:34:06,591 INFO       Test Step  25: Add snapshot to VM vm_TestCase26087_1311323275 with {'persist_memory': False, 'description': 'snap_test_merge_after_cluste_1311340658', 'wait': True}
11:34:30 2019-07-13 11:34:30,519 INFO       Test Setup  26: Removing snapshot snap_test_merge_after_cluste_1311340658 for VM vm_TestCase26087_1311323275
11:34:30 2019-07-13 11:34:30,521 INFO       Test Step  27: Remove vm vm_TestCase26087_1311323275 snapshot with {'description': 'snap_test_merge_after_cluste_1311340658', 'timeout': 2400, 'wait': True}
11:35:17 2019-07-13 11:35:17,888 INFO Result: PASSED
11:35:17 .2019-07-13 11:35:17,892 INFO       Test Teardown  28: Power off VM vm_TestCase26087_1311323275
11:35:18 2019-07-13 11:35:18,435 INFO       Test Teardown  29: [class] Stop vm vm_TestCase26087_1311323275 with {'async': 'true'}
11:35:21 2019-07-13 11:35:21,526 INFO       Test Teardown  30: Remove VM vm_TestCase26087_1311323275
11:35:21 2019-07-13 11:35:21,780 INFO       Test Teardown  31: [class] Remove VM vm_TestCase26087_1311323275 with {'wait': False}
11:35:27 2019-07-13 11:35:27,838 INFO       Test Teardown  32: Clean data-center dc_type_1311303006 and remove it
11:35:28 2019-07-13 11:35:28,014 INFO       Test Step  33: Deactivate storage-domain sd_TestCase26087_1311311447 
11:40:34 E

Comment 6 Dana 2019-08-05 11:15:49 UTC
*** Bug 1733931 has been marked as a duplicate of this bug. ***

Comment 7 Avihai 2019-08-18 16:04:38 UTC
Please provide a clear reproducer to verify this bug.
Without one I can not provide qa_ack for this bug.

Are the following still the verification scenario?

How reproducible:
NFS storage

Steps to Reproduce:
1. Create VM with disk + OS
2. Add 4 disks to VM
3. Create live snapshot
4. Perform dd from the OS disk to one of the other disks 
=> can you give the CLI of exactly what dd do you do here?

5. Live merge the snapshot

Comment 26 matthias.leopold 2019-11-14 18:23:31 UTC
I'm still seeing this bug regularly in my oVirt 4.3.5 installations (using iSCSI storage) and I would be very happy to see it resolved in the 4.3 series.

Comment 27 RHV bug bot 2019-12-13 13:13:52 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 28 RHV bug bot 2019-12-20 17:43:45 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 29 Yosi Ben Shimon 2019-12-22 13:33:56 UTC
Tested on:
ovirt-engine-4.4.0-0.13.master.el7.noarch
vdsm-4.40.0-164.git38a19bb.el8ev.x86_64

I've followed the steps from the z-stream clone https://bugzilla.redhat.com/show_bug.cgi?id=1748387#c27

When I tried to delete the snapshot, I encountered the same failure as https://bugzilla.redhat.com/show_bug.cgi?id=1349950

Can you shed some light on this one, please?

Attaching logs (snapshot_merge_failure) relevant for this flow (flow_id=31fb6860-0f1b-447e-b540-bb36ec55caed).

Comment 30 Yosi Ben Shimon 2019-12-22 13:34:34 UTC
Created attachment 1647167 [details]
snapshot_merge_failure logs

Comment 31 Yosi Ben Shimon 2019-12-22 15:42:59 UTC
(In reply to Yosi Ben Shimon from comment #29)
> When I tried to delete the snapshot, I encountered the same failure as
> https://bugzilla.redhat.com/show_bug.cgi?id=1349950
Not the same but very similar and might be related

Comment 32 RHV bug bot 2020-01-08 14:46:43 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 33 RHV bug bot 2020-01-08 15:14:34 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 34 RHV bug bot 2020-01-24 19:48:29 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 35 Ilan Zuckerman 2020-03-04 15:21:50 UTC
(In reply to Yosi Ben Shimon from comment #29)
> Tested on:
> ovirt-engine-4.4.0-0.13.master.el7.noarch
> vdsm-4.40.0-164.git38a19bb.el8ev.x86_64
> 
> I've followed the steps from the z-stream clone
> https://bugzilla.redhat.com/show_bug.cgi?id=1748387#c27
> 
> When I tried to delete the snapshot, I encountered the same failure as
> https://bugzilla.redhat.com/show_bug.cgi?id=1349950
> 
> Can you shed some light on this one, please?
> 
> Attaching logs (snapshot_merge_failure) relevant for this flow
> (flow_id=31fb6860-0f1b-447e-b540-bb36ec55caed).

Hi Fedor, please relate to Yosi comment.

Comment 37 Ilan Zuckerman 2020-03-09 08:39:56 UTC
The bz was verified following those steps:
rhv-release-4.4.0-23-001.noarch
vdsm-4.40.5-1.el8ev.x86_64

1. Create a VM with disk (on NFS) and with OS (rhel-8) from template
2. Add 3 more disks (NFS):
    1 disk of 10 GiB (OS)
    3 disks of 5 GiB 
3. Start the VM
4. Create a bash file on the root of the OS disk containing:

#!/bin/bash
while true
do
        dd if=1GB.bin of=/mnt/1GB_target.bin
done

5. Create a 1 GiB file on the OS disk using:
    # dd if=/dev/urandom of=1GB.bin bs=64M count=16 iflag=fullblock

6. Create FS on the 10 GiB disk + make FS mkfs.ext4 + mount on /mnt/
8. Take live snapshot (with memory)
9. Execute the dd-bash-file
10. Live merge the snapshot

No errors were seen in engine log besides this one:

2020-03-09 09:26:08,624+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-56) [] VM 'e78caf37-ddba-499d-9290-eb24f8d2ab84' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='9fa051e3-a635-4917-ae03-365d4228a6af', vmId='e78caf37-ddba-499d-9290-eb24f8d2ab84'}', device='usb', type='CONTROLLER', specParams='[index=0, model=qemu-xhci]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='usb', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}'


One remark i have to say, the process of Live merge is taking lot of time. aprox ~10 minutes, I dont see why this takes so long.

START:
2020-03-09 09:25:59,861+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-57) [] EVENT_ID:
 USER_REMOVE_SNAPSHOT(342), Snapshot 'live_snap' deletion for VM 'bz_1727025' was initiated by admin@internal-authz.

END:
2020-03-09 09:35:24,916+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-22) [6b95d438-6b9e-4b4c-b471-81b02ed4977a] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_SUCCESS(356), Snapshot 'live_snap' deletion for VM 'bz_1727025' has been completed.

Attaching vdsm + engine log
storage-ge-09

Please NOTE, the vdsm log is flooded with those errors [1]. They related to another BZ [2] We assume they dont affect the functionality of was was tested.

[1]
libvirt.libvirtError: Requested operation is not valid: cgroup CPU controller is not mounted

[2]
https://bugzilla.redhat.com/show_bug.cgi?id=1810416

Comment 38 Ilan Zuckerman 2020-03-09 08:41:09 UTC
Created attachment 1668608 [details]
verification engine + vdsm logs

Comment 44 errata-xmlrpc 2020-08-04 13:19:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3247


Note You need to log in before you can comment on or make changes to this bug.