Bug 1748387 - [downstream clone - 4.3.7] NPE in DestroyImage endAction during live merge leaving a task in DB for hours causing operations depending on host clean tasks to fail as Deactivate host/StopSPM/deactivate SD
Summary: [downstream clone - 4.3.7] NPE in DestroyImage endAction during live merge le...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.3.5
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ovirt-4.3.6
: 4.3.6
Assignee: Fedor Gavrilov
QA Contact: Yosi Ben Shimon
URL:
Whiteboard:
Depends On: 1727025
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-03 13:39 UTC by RHV bug bot
Modified: 2023-03-24 15:22 UTC (History)
9 users (show)

Fixed In Version: ovirt-engine-4.3.6.6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1727025
Environment:
Last Closed: 2019-10-10 15:36:58 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)
verification logs engine + vdsm (1.62 MB, application/zip)
2020-03-09 07:56 UTC, Ilan Zuckerman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1527861 0 unspecified CLOSED Live merge with continuous I/O to the VM fails with 'VolumeImageHasChildren: Cannot delete volume which has children' 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2019:3010 0 None None None 2019-10-10 15:37:11 UTC
oVirt gerrit 102229 0 'None' MERGED core: make RemoveSnapshotSingleDisk wait for DestroyImage execution 2020-11-01 13:56:31 UTC
oVirt gerrit 103068 0 'None' MERGED core: make RemoveSnapshotSingleDisk wait for DestroyImage execution 2020-11-01 13:56:30 UTC

Description RHV bug bot 2019-09-03 13:39:34 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1727025 +++
======================================================================

Description of problem:
Live merge during continuous IO seems to succeed, however NPE can be seen in the logs:
ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-1480) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
	at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:358) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:348) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:150) [bll.jar:]
	at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:103) [bll.jar:]
	at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_191]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_191]
	at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_191]
	at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:]
	at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)


Version-Release number of selected component (if applicable):
ovirt-engine: 4.4.0-SNAPSHOT

How reproducible:
NFS storage

Steps to Reproduce:
1. Create VM with disk + OS
2. Add 4 disks to VM
3. Create live snapshot
4. Perform dd from the OS disk to one of the other disks
5. Live merge the snapshot

Actual results:
Live merge succeeds, but there are exceptions in engine log

Expected results:
No exceptions

Additional info:

(Originally by Fedor Gavrilov)

Comment 1 RHV bug bot 2019-09-03 13:39:37 UTC
I see the same issue at 4.3.5-6 (ovirt-engine 4.3.5.4/vdsm-4.30.24-2.el7ev.x86_64) in move live disks operations(no IO is done) on FCP SD.

Task to remove host/deactivate host/StopSPM get stuck for hours and operations that depend on clean tasks (as StopSPM/move host to maintenance etc) fails failing many automation TC's -> raising priority.

The tests fails as SPM is not stoped as a task is still running(with status 'finished').

Not stopping SPM on vds host_mixed_2, pool id 43238d81-6835-4217-8b81-951b564655d2 as there are uncleared tasks Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished'

Task for moving live disk(TestCase11619) was finished but not removed from DB for hours from 2019-07-11 22:55 until 2019-07-12 06:29:44.

Issue occured here:
2019-07-11 22:55:28,516+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-44) [disks_syncAction_f770e3d9-8595-41d3] EVENT_ID: USER_MOVED_DISK_FINISHED_SUCCESS(2,010), User admin@internal-authz finished moving disk vm_thin_disk_image_11619_Disk1 to domain fcp_2.
2019-07-11 22:55:31,893+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] Polling and updating Async Tasks: 1 tasks, 1 tasks to poll now
2019-07-11 22:55:31,899+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] SPMAsyncTask::PollTask: Polling task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
2019-07-11 22:55:31,899+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] BaseAsyncTask::onTaskEndSuccess: Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2019-07-11 22:55:31,900+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] CommandAsyncTask::endActionIfNecessary: All tasks of command '4d1745ef-44af-4e7d-abb9-67edd1748a58' has ended -> executing 'endAction'
2019-07-11 22:55:31,900+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] CommandAsyncTask::endAction: Ending action for '1' tasks (command ID: '4d1745ef-44af-4e7d-abb9-67edd1748a58'): calling endAction '.
2019-07-11 22:55:31,901+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-3678) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'DestroyImage',
2019-07-11 22:55:31,901+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-3678) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:358) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:348) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:150) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:103) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_222-ea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_222-ea]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_222-ea]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_222-ea]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_222-ea]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]

2019-07-11 22:55:31,902+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-3678) [] CommandAsyncTask::HandleEndActionResult: endAction for action type 'DestroyImage' threw an unrecoverable RuntimeException the task will be cleared.
2019-07-11 22:55:31,908+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-3678) [] BaseAsyncTask::removeTaskFromDB: Removed task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' from DataBase


22:55:35 2019-07-11 22:55:34,943 INFO       Test Step   9: Moving disk a1b9bda5-243e-4358-a98a-a398df1d507a to storage domain fcp_2


[root@hosted-engine-06 ovirt-engine]# zgrep '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' engine.log*
engine.log-20190712.gz:2019-07-11 22:55:22,740+03 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (EE-ManagedThreadFactory-commandCoordinator-Thread-5) [disks_syncAction_f770e3d9-8595-41d3] CommandMultiAsyncTasks::attachTask: Attaching task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' to command '4d1745ef-44af-4e7d-abb9-67edd1748a58'.
engine.log-20190712.gz:2019-07-11 22:55:22,757+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-commandCoordinator-Thread-5) [disks_syncAction_f770e3d9-8595-41d3] Adding task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
engine.log-20190712.gz:2019-07-11 22:55:22,774+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-commandCoordinator-Thread-5) [disks_syncAction_f770e3d9-8595-41d3] BaseAsyncTask::startPollingTask: Starting to poll task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7'.
engine.log-20190712.gz:2019-07-11 22:55:22,774+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-commandCoordinator-Thread-5) [disks_syncAction_f770e3d9-8595-41d3] BaseAsyncTask::startPollingTask: Starting to poll task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7'.
engine.log-20190712.gz:2019-07-11 22:55:31,899+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] SPMAsyncTask::PollTask: Polling task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
engine.log-20190712.gz:2019-07-11 22:55:31,899+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] BaseAsyncTask::onTaskEndSuccess: Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
engine.log-20190712.gz:2019-07-11 22:55:31,908+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-3678) [] BaseAsyncTask::removeTaskFromDB: Removed task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' from DataBase


engine.log-20190712.gz:2019-07-12 01:52:40,445+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-21) [hosts_syncAction_81f17357-1567-4103] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '43238d81-6835-4217-8b81-951b564655d2' as there are uncleared tasks 'Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished''
engine.log-20190712.gz:2019-07-12 01:52:40,453+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-21) [hosts_syncAction_81f17357-1567-4103] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds host_mixed_2, pool id 43238d81-6835-4217-8b81-951b564655d2 as there are uncleared tasks Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished'
engine.log-20190712.gz:2019-07-12 01:52:45,859+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-21) [hosts_syncAction_2019249e-285b-459e] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '43238d81-6835-4217-8b81-951b564655d2' as there are uncleared tasks 'Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished''
engine.log-20190712.gz:2019-07-12 01:52:45,867+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-21) [hosts_syncAction_2019249e-285b-459e] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds host_mixed_2, pool id 43238d81-6835-4217-8b81-951b564655d2 as there are uncleared tasks Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished'
engine.log-20190712.gz:2019-07-12 01:53:08,156+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-21) [hosts_syncAction_abb918c3-1807-405a] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '43238d81-6835-4217-8b81-951b564655d2' as there are uncleared tasks 'Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished''
engine.log-20190712.gz:2019-07-12 01:53:08,163+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-21) [hosts_syncAction_abb918c3-1807-405a] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds host_mixed_2, pool id 43238d81-6835-4217-8b81-951b564655d2 as there are uncleared tasks Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7', status 'finished'
engine.log-20190713.gz:2019-07-12 06:28:37,917+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engine-Thread-11) [44c69927] Adding task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
engine.log-20190713.gz:2019-07-12 06:28:38,215+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-11) [44c69927] BaseAsyncTask::startPollingTask: Starting to poll task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7'.
engine.log-20190713.gz:2019-07-12 06:28:44,463+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:28:54,466+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-49) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:04,468+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:14,471+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-94) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:24,472+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-38) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:34,474+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-97) [44c69927] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:44,476+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:44,476+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:44,863+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] Task id '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
engine.log-20190713.gz:2019-07-12 06:29:44,863+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] SPMAsyncTask::PollTask: Polling task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
engine.log-20190713.gz:2019-07-12 06:29:44,863+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] BaseAsyncTask::onTaskEndSuccess: Task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
engine.log-20190713.gz:2019-07-12 06:29:44,863+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] SPMAsyncTask::ClearAsyncTask: Attempting to clear task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7'
engine.log-20190713.gz:2019-07-12 06:29:44,864+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] START, SPMClearTaskVDSCommand( SPMTaskGuidBaseVDSCommandParameters:{storagePoolId='43238d81-6835-4217-8b81-951b564655d2', ignoreFailoverLimit='false', taskId='5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7'}), log id: 5e495ff3
engine.log-20190713.gz:2019-07-12 06:29:44,865+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] START, HSMClearTaskVDSCommand(HostName = host_mixed_2, HSMTaskGuidBaseVDSCommandParameters:{hostId='f31df623-c3c9-420f-a78d-a3a9a4114b42', taskId='5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7'}), log id: c6f0343
engine.log-20190713.gz:2019-07-12 06:29:44,911+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] BaseAsyncTask::removeTaskFromDB: Removed task '5684fb07-4ff8-4d1f-b12f-8c2dc3afe0c7' from DataBase

Scenario and time stamps:
22:51:03 2019-07-11 22:51:03,714 INFO       Test Setup   1: [class] Create new vm with nic nic1, disk and OS with {'diskActive': True, 'wipe_after_delete': 'false', 'slim': False, 'cluster': 'golden_env_mixed_1', 'linked': 'true', 'network': 'ovirtmgmt', 'provisioned_size': 6442450944, 'cpu_socket': 1, 'nicType': 'virtio', 'cpu_cores': 1, 'plugged': 'true', 'start': 'false', 'vmDescription': '', 'memory': 1073741824, 'vmName': 'vm_thin_disk_image_11619', 'template': 'Blank', 'volumeFormat': 'cow', 'display_type': 'spice', 'installation': False, 'storageDomainName': 'fcp_1', 'attempt': 60, 'diskInterface': 'virtio', 'bootable': True, 'type': 'desktop', 'copy_permissions': False, 'interval': 60, 'volumeType': True, 'diskType': 'data', 'useAgent': True, 'async': False, 'os_type': 'RHEL6x64'}
22:51:03 2019-07-11 22:51:03,718 INFO       Test Setup   2: [class] Description vm_thin_disk_image_11619: add new vm (without starting it) with {'name': 'vm_thin_disk_image_11619', 'cpu_socket': 1, 'copy_permissions': False, 'cpu_cores': 1, 'cluster': 'golden_env_mixed_1', 'display_type': 'spice', 'template': 'Blank', 'memory': 1073741824, 'async': False, 'os_type': 'RHEL6x64', 'type': 'desktop', 'wait': True}
22:51:04 2019-07-11 22:51:04,356 INFO       Test Setup   3: [class] Add vNIC name nic1 to VM vm_thin_disk_image_11619 with {'interface': 'virtio', 'vnic_profile': 'ovirtmgmt', 'linked': 'true', 'network': 'ovirtmgmt', 'plugged': 'true'}
22:51:26 2019-07-11 22:51:25,927 INFO       Test Setup   4: [class] Create new vm with nic nic1, disk and OS with {'diskActive': True, 'wipe_after_delete': 'false', 'slim': False, 'cluster': 'golden_env_mixed_1', 'linked': 'true', 'network': 'ovirtmgmt', 'provisioned_size': 6442450944, 'cpu_socket': 1, 'nicType': 'virtio', 'cpu_cores': 1, 'plugged': 'true', 'start': 'false', 'vmDescription': '', 'memory': 1073741824, 'vmName': 'vm_prealloc_disk_image_11619', 'template': 'Blank', 'volumeFormat': 'raw', 'display_type': 'spice', 'installation': False, 'storageDomainName': 'fcp_1', 'attempt': 60, 'diskInterface': 'virtio', 'bootable': True, 'type': 'desktop', 'copy_permissions': False, 'interval': 60, 'volumeType': False, 'diskType': 'data', 'useAgent': True, 'async': False, 'os_type': 'RHEL6x64'}
22:51:26 2019-07-11 22:51:25,931 INFO       Test Setup   5: [class] Description vm_prealloc_disk_image_11619: add new vm (without starting it) with {'name': 'vm_prealloc_disk_image_11619', 'cpu_socket': 1, 'copy_permissions': False, 'cpu_cores': 1, 'cluster': 'golden_env_mixed_1', 'display_type': 'spice', 'template': 'Blank', 'memory': 1073741824, 'async': False, 'os_type': 'RHEL6x64', 'type': 'desktop', 'wait': True}
22:51:26 2019-07-11 22:51:26,661 INFO       Test Setup   6: [class] Add vNIC name nic1 to VM vm_prealloc_disk_image_11619 with {'interface': 'virtio', 'vnic_profile': 'ovirtmgmt', 'linked': 'true', 'network': 'ovirtmgmt', 'plugged': 'true'}
22:51:38 2019-07-11 22:51:38,918 INFO 014: storage/rhevmtests.storage.storage_disk_image_format.test_disk_image_format.TestCase11619.test_live_move_disk[fcp]
22:51:38 2019-07-11 22:51:38,918 INFO     Start a live disk migration
22:51:39 2019-07-11 22:51:38,918 INFO     * Thin provisioned disk should remain the same
22:51:39 2019-07-11 22:51:38,918 INFO     * Preallocated disk should change to thin provisioned
22:51:39 2019-07-11 22:51:38,918 INFO STORAGE: FCP
22:51:39 2019-07-11 22:51:38,921 INFO       Test Step   7: Start VMs with {'max_workers': 2, 'wait_for_ip': False, 'wait_for_status': 'up', 'vm_list': ['vm_prealloc_disk_image_11619', 'vm_thin_disk_image_11619']}
22:53:00 2019-07-11 22:53:00,305 INFO       Test Step   8: Moving disk c09c2b16-f47f-4624-abcb-153da8076fbd to storage domain fcp_2
22:55:35 2019-07-11 22:55:34,943 INFO       Test Step   9: Moving disk a1b9bda5-243e-4358-a98a-a398df1d507a to storage domain fcp_2

(Originally by Avihai Efrat)

Comment 2 RHV bug bot 2019-09-03 13:39:39 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

(Originally by pm-rhel)

Comment 5 RHV bug bot 2019-09-03 13:39:46 UTC
Created attachment 1590489 [details]
engine and vdsm logs 4.3.5-6(ovirt engine 4.3.5.4)

(Originally by Avihai Efrat)

Comment 6 RHV bug bot 2019-09-03 13:39:48 UTC
Impact:
Cannot deactivate SD as host as active tasks
Cannot move the host to maintenance/StopSPM as host as active tasks.

Another hit of the same issue on Tier3 runs(a different setup than the previous failure)

Issue fount at Engine one more time this time a SD can not be deactivated as the host/SPM had unclear tasks due to the following leaving leftover causing creation of SD on the same LUN to fail.

Task started at 2019-07-13 11:35:11 and cleaned from DB at 2019-07-13 14:45:09 ( ~3H later!)  : 


2019-07-13 11:35:17,493+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-34495) [] [within thread]: endAction for action type DestroyImage threw an exception.: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:358) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:348) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:150) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:103) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_222-ea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_222-ea]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_222-ea]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_222-ea]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_222-ea]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]

Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9')  flow in engine log:

[root@hosted-engine-04 ovirt-engine]# zgrep '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' engine.log*

Started at 2019-07-13 11:35:11 and cleaned from DB at 2019-07-13 14:45:09 ( ~3H later!)  : 
engine.log-20190714.gz:2019-07-13 11:35:11,009+03 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [snapshots_delete_350a15e1-140b-40c8] CommandMultiAsyncTasks::attachTask: Attaching task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' to command '1117605c-a708-4f79-9b7e-a2bc36269a0c'.
engine.log-20190714.gz:2019-07-13 11:35:11,018+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [snapshots_delete_350a15e1-140b-40c8] Adding task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
engine.log-20190714.gz:2019-07-13 11:35:11,025+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [snapshots_delete_350a15e1-140b-40c8] BaseAsyncTask::startPollingTask: Starting to poll task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9'.
engine.log-20190714.gz:2019-07-13 11:35:11,025+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [snapshots_delete_350a15e1-140b-40c8] BaseAsyncTask::startPollingTask: Starting to poll task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9'.
engine.log-20190714.gz:2019-07-13 11:35:17,491+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [] SPMAsyncTask::PollTask: Polling task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
engine.log-20190714.gz:2019-07-13 11:35:17,491+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [] BaseAsyncTask::onTaskEndSuccess: Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
engine.log-20190714.gz:2019-07-13 11:35:17,515+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-34495) [] BaseAsyncTask::removeTaskFromDB: Removed task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' from DataBase
engine.log-20190714.gz:2019-07-13 11:36:03,592+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-98) [41f555ad] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_1', pool id '449cbed6-2319-4cc4-8c6e-8eea85bfc222' as there are uncleared tasks 'Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9', status 'finished''
engine.log-20190714.gz:2019-07-13 11:36:03,596+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-98) [41f555ad] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds host_mixed_1, pool id 449cbed6-2319-4cc4-8c6e-8eea85bfc222 as there are uncleared tasks Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9', status 'finished'
engine.log-20190714.gz:2019-07-13 11:36:06,428+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-engine-Thread-34526) [e493569] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_1', pool id '449cbed6-2319-4cc4-8c6e-8eea85bfc222' as there are uncleared tasks 'Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9', status 'finished''
engine.log-20190714.gz:2019-07-13 11:36:06,433+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-34526) [e493569] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds host_mixed_1, pool id 449cbed6-2319-4cc4-8c6e-8eea85bfc222 as there are uncleared tasks Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9', status 'finished'
engine.log-20190714.gz:2019-07-13 14:44:01,657+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engine-Thread-10) [6c60c302] Adding task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
engine.log-20190714.gz:2019-07-13 14:44:01,772+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-10) [6c60c302] BaseAsyncTask::startPollingTask: Starting to poll task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9'.
engine.log-20190714.gz:2019-07-13 14:44:07,658+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-36) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:44:17,664+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:44:27,668+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-72) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:44:37,673+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-46) [452a4857] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:44:47,677+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-3) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:44:57,681+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-21) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' is in pre-polling  period and should not be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:45:07,685+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:45:07,689+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:45:09,194+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] Task id '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
engine.log-20190714.gz:2019-07-13 14:45:09,194+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] SPMAsyncTask::PollTask: Polling task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
engine.log-20190714.gz:2019-07-13 14:45:09,194+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] BaseAsyncTask::onTaskEndSuccess: Task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
engine.log-20190714.gz:2019-07-13 14:45:09,194+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] SPMAsyncTask::ClearAsyncTask: Attempting to clear task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9'
engine.log-20190714.gz:2019-07-13 14:45:09,195+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] START, SPMClearTaskVDSCommand( SPMTaskGuidBaseVDSCommandParameters:{storagePoolId='449cbed6-2319-4cc4-8c6e-8eea85bfc222', ignoreFailoverLimit='false', taskId='4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9'}), log id: 614994cf
engine.log-20190714.gz:2019-07-13 14:45:09,196+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] START, HSMClearTaskVDSCommand(HostName = host_mixed_1, HSMTaskGuidBaseVDSCommandParameters:{hostId='6b0e3e6a-f290-4fa0-ba07-de8da47f48ae', taskId='4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9'}), log id: 649fbebb
engine.log-20190714.gz:2019-07-13 14:45:09,231+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [] BaseAsyncTask::removeTaskFromDB: Removed task '4bcfc7c3-f8a4-444e-8468-6f51c6c1b4b9' from DataBase

(Originally by Avihai Efrat)

Comment 7 RHV bug bot 2019-09-03 13:39:50 UTC
Adding TC scenario and timestamps:

11:30:30 2019-07-13 11:30:30,116 INFO       Test Setup   1: Create data-center dc_type_1311303006 with cluster cl_type_1311303006 and host host_mixed_1
11:30:30 2019-07-13 11:30:30,116 INFO       Test Step   2: Add data-center dc_type_1311303006
11:30:30 2019-07-13 11:30:30,243 INFO       Test Step   3: Add cluster cl_type_1311303006
11:30:30 2019-07-13 11:30:30,246 INFO       Test Setup   4: [class] Add cluster with {'version': '4.2', 'data_center': 'dc_type_1311303006', 'cpu': 'Intel Westmere Family', 'name': 'cl_type_1311303006'}
11:30:30 2019-07-13 11:30:30,448 INFO       Test Step   5: Move host host_mixed_1 to cluster cl_type_1311303006
11:30:30 2019-07-13 11:30:30,451 INFO       Test Setup   6: [class] Switch host host_mixed_1 to different cluster cl_type_1311303006 with {'activate': True}
11:30:30 2019-07-13 11:30:30,503 INFO       Test Setup   7: [class] Deactivate the host host_mixed_1. with {'expected_status': 'maintenance'}
11:30:41 2019-07-13 11:30:40,943 INFO       Test Setup   8: [class] Update properties of host host_mixed_1 (provided in parameters) with {'cluster': 'cl_type_1311303006'}
11:30:43 2019-07-13 11:30:43,238 INFO       Test Setup   9: [class] Activate host host_mixed_1 (set status to UP) with {'wait': True}
11:30:53 2019-07-13 11:30:53,443 INFO       Test Setup  10: Move host host_mixed_1 into the newly created cluster
11:30:53 2019-07-13 11:30:53,445 INFO       Test Setup  11: [class] Switch host host_mixed_1 to different cluster cl_type_1311303006 with {'activate': True}
11:30:53 2019-07-13 11:30:53,499 INFO       Test Setup  12: [class] Deactivate the host host_mixed_1. with {'expected_status': 'maintenance'}
11:31:04 2019-07-13 11:31:03,958 INFO       Test Setup  13: [class] Update properties of host host_mixed_1 (provided in parameters) with {'cluster': 'cl_type_1311303006'}
11:31:04 2019-07-13 11:31:04,244 INFO       Test Setup  14: [class] Activate host host_mixed_1 (set status to UP) with {'wait': True}
11:31:14 2019-07-13 11:31:14,473 INFO       Test Setup  15: Create new storage domain sd_TestCase26087_1311311447
11:31:14 2019-07-13 11:31:14,476 INFO       Test Setup  16: [class] Add storage domain to given data-center with {'index': 0, 'storage_type': 'iscsi', 'storage_domain': 'sd_TestCase26087_1311311447', 'data_center': 'dc_type_1311303006'}
11:32:32 2019-07-13 11:32:32,753 INFO       Test Setup  17: Creating VM vm_TestCase26087_1311323275
11:32:32 2019-07-13 11:32:32,837 INFO       Test Setup  18: [class] Create new vm with nic nic1, disk and OS with {'diskActive': True, 'wipe_after_delete': 'false', 'slim': False, 'cluster': 'cl_type_1311303006', 'linked': 'true', 'network': 'ovirtmgmt', 'provisioned_size': 6442450944, 'cpu_socket': 1, 'nicType': 'virtio', 'cpu_cores': 1, 'plugged': 'true', 'start': 'false', 'vmDescription': '', 'memory': 1073741824, 'vmName': 'vm_TestCase26087_1311323275', 'template': 'Blank', 'volumeFormat': 'cow', 'display_type': 'spice', 'installation': False, 'storageDomainName': 'sd_TestCase26087_1311311447', 'attempt': 60, 'diskInterface': 'virtio', 'bootable': True, 'type': 'desktop', 'copy_permissions': False, 'interval': 60, 'volumeType': True, 'diskType': 'data', 'useAgent': True, 'async': False, 'os_type': 'RHEL6x64'}
11:32:32 2019-07-13 11:32:32,840 INFO       Test Setup  19: [class] Description vm_TestCase26087_1311323275: add new vm (without starting it) with {'name': 'vm_TestCase26087_1311323275', 'cpu_socket': 1, 'copy_permissions': False, 'cpu_cores': 1, 'cluster': 'cl_type_1311303006', 'display_type': 'spice', 'template': 'Blank', 'memory': 1073741824, 'async': False, 'os_type': 'RHEL6x64', 'type': 'desktop', 'wait': True}
11:32:33 2019-07-13 11:32:33,512 INFO       Test Setup  20: [class] Add vNIC name nic1 to VM vm_TestCase26087_1311323275 with {'interface': 'virtio', 'vnic_profile': 'ovirtmgmt', 'linked': 'true', 'network': 'ovirtmgmt', 'plugged': 'true'}
11:32:45 2019-07-13 11:32:45,023 INFO 191: storage/rhevmtests.storage.storage_remove_snapshots.test_live_merge.TestCase26087.test_merge_after_cluster_upgrade[iscsi]
11:32:45 2019-07-13 11:32:45,023 INFO     Live merge after cluster upgrade from 4.2 to latest version
11:32:45 2019-07-13 11:32:45,024 INFO STORAGE: ISCSI
11:32:45 2019-07-13 11:32:45,024 INFO       Test Setup  21: Starting VM vm_TestCase26087_1311323275 with cloud-init as run once
11:32:45 2019-07-13 11:32:45,026 INFO       Test Step  22: Run VM vm_TestCase26087_1311323275 once with {'use_cloud_init': True, 'pause': False, 'wait_for_state': 'up', 'use_sysprep': False}
11:34:13 2019-07-13 11:34:06,098 INFO       Test Setup  23: Upgrading cluster cl_type_1311303006 from version 4.2 to version 4.3
11:34:13 2019-07-13 11:34:06,102 INFO       Test Step  24: Update cluster cl_type_1311303006 with {'version': '4.3'}
11:34:14 2019-07-13 11:34:06,591 INFO       Test Step  25: Add snapshot to VM vm_TestCase26087_1311323275 with {'persist_memory': False, 'description': 'snap_test_merge_after_cluste_1311340658', 'wait': True}
11:34:30 2019-07-13 11:34:30,519 INFO       Test Setup  26: Removing snapshot snap_test_merge_after_cluste_1311340658 for VM vm_TestCase26087_1311323275
11:34:30 2019-07-13 11:34:30,521 INFO       Test Step  27: Remove vm vm_TestCase26087_1311323275 snapshot with {'description': 'snap_test_merge_after_cluste_1311340658', 'timeout': 2400, 'wait': True}
11:35:17 2019-07-13 11:35:17,888 INFO Result: PASSED
11:35:17 .2019-07-13 11:35:17,892 INFO       Test Teardown  28: Power off VM vm_TestCase26087_1311323275
11:35:18 2019-07-13 11:35:18,435 INFO       Test Teardown  29: [class] Stop vm vm_TestCase26087_1311323275 with {'async': 'true'}
11:35:21 2019-07-13 11:35:21,526 INFO       Test Teardown  30: Remove VM vm_TestCase26087_1311323275
11:35:21 2019-07-13 11:35:21,780 INFO       Test Teardown  31: [class] Remove VM vm_TestCase26087_1311323275 with {'wait': False}
11:35:27 2019-07-13 11:35:27,838 INFO       Test Teardown  32: Clean data-center dc_type_1311303006 and remove it
11:35:28 2019-07-13 11:35:28,014 INFO       Test Step  33: Deactivate storage-domain sd_TestCase26087_1311311447 
11:40:34 E

(Originally by Avihai Efrat)

Comment 8 RHV bug bot 2019-09-03 13:39:51 UTC
*** Bug 1733931 has been marked as a duplicate of this bug. ***

(Originally by Dana Elfassy)

Comment 9 RHV bug bot 2019-09-03 13:39:53 UTC
Please provide a clear reproducer to verify this bug.
Without one I can not provide qa_ack for this bug.

Are the following still the verification scenario?

How reproducible:
NFS storage

Steps to Reproduce:
1. Create VM with disk + OS
2. Add 4 disks to VM
3. Create live snapshot
4. Perform dd from the OS disk to one of the other disks 
=> can you give the CLI of exactly what dd do you do here?

5. Live merge the snapshot

(Originally by Avihai Efrat)

Comment 27 Yosi Ben Shimon 2019-09-09 19:30:56 UTC
Tested on:
ovirt-engine-4.3.6.5-0.1.el7.noarch
vdsm-4.30.30-1.el7ev.x86_64

Verification steps:
1. Create a VM with disk (on NFS) and with OS (rhel-8) from template
2. Add 3 more disks (NFS):
    1 disk of 10 GiB
    3 disks of 5 GiB 
3. Start the VM
4. Create a bash file on the root of the OS disk containing:
#!/bin/bash
while true
do
        dd if=1GB.bin of=/mnt_dir/1GB_target.bin
done
5. Create a 1 GiB file on the OS disk using:
    # dd if=/dev/urandom of=1GB.bin bs=64M count=16 iflag=fullblock
6. Create FS on the 10 GiB disk + mount
7. Execute the dd-bash-file
8. Take live snapshot (with memory)
9. Live merge the snapshot

* Tried steps 8 & 9 multiple times during the dd operation - all went fine without any errors or exceptions.
Also, tried steps 8 & 9 while interrupting the bash and restarting in during the same snapshot creation and/or merge.

Live migrate the target disk (as for comment #1) to other NFS domain while the dd was running - worked without errors.

Checked that all the tasks were removed from the DB after some time - all tasks were deleted successfully

After stopping and removing the VM, I tried to move both the SD and the host to maintenance (as for comment #1) - all worked as excepted

Moving to VERIFIED

Comment 28 RHV bug bot 2019-09-25 08:46:33 UTC
INFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Tag 'ovirt-engine-4.3.5.6' doesn't contain patch 'https://gerrit.ovirt.org/103068']
gitweb: https://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=shortlog;h=refs/tags/ovirt-engine-4.3.5.6

For more info please contact: rhv-devops

Comment 30 errata-xmlrpc 2019-10-10 15:36:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3010

Comment 31 Ilan Zuckerman 2020-03-09 07:54:51 UTC
The bz was verified following those steps:
rhv-release-4.4.0-23-001.noarch
vdsm-4.40.5-1.el8ev.x86_64

1. Create a VM with disk (on NFS) and with OS (rhel-8) from template
2. Add 3 more disks (NFS):
    1 disk of 10 GiB (OS)
    3 disks of 5 GiB 
3. Start the VM
4. Create a bash file on the root of the OS disk containing:

#!/bin/bash
while true
do
        dd if=1GB.bin of=/mnt/1GB_target.bin
done

5. Create a 1 GiB file on the OS disk using:
    # dd if=/dev/urandom of=1GB.bin bs=64M count=16 iflag=fullblock

6. Create FS on the 10 GiB disk + make FS mkfs.ext4 + mount on /mnt/
8. Take live snapshot (with memory)
9. Execute the dd-bash-file
10. Live merge the snapshot

No errors were seen in engine log besides this one:

2020-03-09 09:26:08,624+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-56) [] VM 'e78caf37-ddba-499d-9290-eb24f8d2ab84' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='9fa051e3-a635-4917-ae03-365d4228a6af', vmId='e78caf37-ddba-499d-9290-eb24f8d2ab84'}', device='usb', type='CONTROLLER', specParams='[index=0, model=qemu-xhci]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='usb', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}'


One remark i have to say, the process of Live merge is taking lot of time. aprox ~10 minutes, I dont see why this takes so long.

START:
2020-03-09 09:25:59,861+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-57) [] EVENT_ID:
 USER_REMOVE_SNAPSHOT(342), Snapshot 'live_snap' deletion for VM 'bz_1727025' was initiated by admin@internal-authz.

END:
2020-03-09 09:35:24,916+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-22) [6b95d438-6b9e-4b4c-b471-81b02ed4977a] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_SUCCESS(356), Snapshot 'live_snap' deletion for VM 'bz_1727025' has been completed.

Attaching vdsm + engine log
storage-ge-09

Please NOTE, the vdsm log is flooded with those errors [1]. They related to another BZ [2] We assume they dont affect the functionality of was was tested.

[1]
libvirt.libvirtError: Requested operation is not valid: cgroup CPU controller is not mounted

[2]
https://bugzilla.redhat.com/show_bug.cgi?id=1810416

Comment 32 Ilan Zuckerman 2020-03-09 07:56:18 UTC
Created attachment 1668604 [details]
verification logs engine + vdsm


Note You need to log in before you can comment on or make changes to this bug.