Description of problem: After a failure to live migrate a disk due to VM power off during LSM's live snapshot creation, the deleteImage task is not cleared on vdsm. Version-Release number of selected component (if applicable): ovirt-engine-4.2.4.5-0.1.el7_3.noarch vdsm-4.20.33-1.el7ev.x86_64 How reproducible: Always Steps to Reproduce: 1. Live migrate a VM disk and during LSM's snapshot creation, power off the VM Actual results: LSM starts: 018-07-07 06:01:30,483+03 INFO [org.ovirt.engine.core.bll.storage.disk.MoveDiskCommand] (default task-63) [disks_syncAction_2e3f3f46-2d97-479f] Lock Acquired to object 'EngineLock:{exclusiveLocks='[4ea04b6e-9ed0-466f-afd5-728e053f9ea2=DISK]', sharedLocks=''}' 2018-07-07 06:01:30,485+03 INFO [org.ovirt.engine.core.bll.storage.disk.MoveDiskCommand] (default task-63) [disks_syncAction_2e3f3f46-2d97-479f] Running command: MoveDiskCommand internal: false. Entities affected : ID: 4ea04b6e-9ed0-466f-afd5-728e053f9ea2 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER, ID: 9fc25352-4296-4e11-8aaa-c28bff1cd711 Type: StorageAction group CREATE_DISK with role type USER Volume creation for LSM's snapshot: 2018-07-07 06:01:30,892+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (default task-63) [disks_syncAction_2e3f3f46-2d97-479f] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandPa rameters:{storagePoolId='a43f9eca-8147-11e8-b6d6-00163e7bd001', ignoreFailoverLimit='false', storageDomainId='f008e138-6e5b-4cc5-9493-966060017522', imageGroupId='4ea04b6e-9ed0-466f-afd5-728e053f9ea2', imageSize InBytes='1073741824', volumeFormat='COW', newImageId='464058e8-1196-4890-a003-bad0d967cb94', imageType='Sparse', newImageDescription='', imageInitialSizeInBytes='0', imageId='623f40d0-ab1f-4b77-87d4-7e7745767134 ', sourceImageGroupId='4ea04b6e-9ed0-466f-afd5-728e053f9ea2'}), log id: 1f251a13 In vdsm: 2018-07-07 06:01:30,896+0300 INFO (jsonrpc/0) [vdsm.api] START createVolume(sdUUID='f008e138-6e5b-4cc5-9493-966060017522', spUUID='a43f9eca-8147-11e8-b6d6-00163e7bd001', imgUUID='4ea04b6e-9ed0-466f-afd5-728e053f9ea2', size='1073741824', volFormat=4, preallocate=2, diskType='DATA', volUUID='464058e8-1196-4890-a003-bad0d967cb94', desc=u'', srcImgUUID='4ea04b6e-9ed0-466f-afd5-728e053f9ea2', srcVolUUID='623f40d0-ab1f-4b77-87d4-7e7745767134', initialSize=None) from=::ffff:10.35.162.71,46396, flow_id=disks_syncAction_2e3f3f46-2d97-479f, task_id=d28bfc2b-f49d-49d1-b3dc-9700f37a6ed9 (api:46) VM got stopped: 2018-07-07 06:01:33,670+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-2) [] START, DestroyVDSCommand(HostName = host_mixed_2, DestroyVmVDSCommandParameters:{hostId='6e8926ea-7859-47e1-8d9b-4e59238914de', vmId='5824e6f0-3337-4cec-a5ba-0853281788ac', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 539a8312 2018-07-07 06:01:33,672+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-2) [] Failed to destroy VM '5824e6f0-3337-4cec-a5ba-0853281788ac' because VM does not exist, ignoring 2018-07-07 06:01:33,672+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-2) [] FINISH, DestroyVDSCommand, log id: 539a8312 2018-07-07 06:01:33,672+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-2) [] VM '5824e6f0-3337-4cec-a5ba-0853281788ac'(vm_TestCase5967_0705593609) moved from 'Up' --> 'Down' 2018-07-07 06:01:33,677+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-44) [vms_syncAction_cfcd8adc-f69a-45e3] EVENT_ID: USER_STOP_VM(33), VM vm_TestCase5967_0705593609 powered off by admin@internal-authz (Host: host_mixed_2). deleteImage starts (LSM roll back) with task id 259861b9-bea1-4898-a3a2-49f6b0b151f5: 2018-07-07 06:01:35,122+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='259861b9-bea1-4898-a3a2-49f6b0b151f5') moving from state preparing -> state preparing (task:602) 2018-07-07 06:01:35,122+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='259861b9-bea1-4898-a3a2-49f6b0b151f5') moving from state preparing -> state preparing (task:602) 2018-07-07 06:01:35,122+0300 INFO (jsonrpc/7) [vdsm.api] START deleteVolume(sdUUID='f008e138-6e5b-4cc5-9493-966060017522', spUUID='a43f9eca-8147-11e8-b6d6-00163e7bd001', imgUUID='4ea04b6e-9ed0-466f-afd5-728e053f9ea2', volumes=['464058e8-1196-4890-a003-bad0d967cb94'], postZero='true', force='true', discard=True) from=::ffff:10.35.162.71,46396, flow_id=disks_syncAction_2e3f3f46-2d97-479f, task_id=259861b9-bea1-4898-a3a2-49f6b0b151f5 (api:46) The task remain And the task remains in 'finished' state for ~10 hours: 2018-07-07 16:49:00,185+0300 INFO (jsonrpc/5) [vdsm.api] FINISH getAllTasksStatuses return={'allTasksStatus': {'259861b9-bea1-4898-a3a2-49f6b0b151f5': {'code': 0, 'message': u'1 jobs completed successfully', 't askState': 'finished', 'taskResult': 'success', 'taskID': '259861b9-bea1-4898-a3a2-49f6b0b151f5'}}} from=::ffff:10.35.162.71,53814, flow_id=29222a1e, task_id=101c7515-b8df-4690-adda-15f5eac50784 (api:52) Only here it's being cleared: 2018-07-07 16:49:00,207+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-68) [29222a1e] FINISH, HSMClearTaskVDSCommand, log id: 1b5dd4c8 2018-07-07 16:49:00,208+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-68) [29222a1e] FINISH, SPMClearTaskVDSCommand, log id: 1ed14821 2018-07-07 16:49:00,212+03 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-68) [29222a1e] BaseAsyncTask::removeTaskFromDB: Removed task '259861b9-bea1-4898-a3 a2-49f6b0b151f5' from DataBase 2018-07-07 16:49:00,189+0300 INFO (jsonrpc/1) [vdsm.api] START clearTask(taskID='259861b9-bea1-4898-a3a2-49f6b0b151f5', spUUID=None, options=None) from=::ffff:10.35.162.71,53814, flow_id=29222a1e, task_id=21650 601-2709-41e3-ab74-759534f39e33 (api:46) Expected results: The task should be cleared
Do we have full logs for this?
Created attachment 1459190 [details] logs
Verified. Versions: ovirt-engine-4.2.6.3 vdsm-4.20.37-2
Moving back to ASSIGNED because this bug seems to still exist. Tested using: ovirt-engine-4.2.6.3-1.el7.noarch vdsm-4.20.37-3.git924eec4.el7.x86_64 While running the specified test case (5967) I found that there is a "deleteVolume" task that is not being cleared. [root@storage-ge16-vdsm3 ~]# vdsm-client Host getAllTasks { "48a1e000-9eff-48a0-9bf5-9152978854a2": { "verb": "deleteVolume", "code": 0, "state": "finished", "tag": "spm", "result": "", "message": "1 jobs completed successfully", "id": "48a1e000-9eff-48a0-9bf5-9152978854a2" } } According to the VDSM log, this task started at 2018-08-22 10:38:28: 2018-08-22 10:38:28,982+0300 DEBUG (jsonrpc/1) [storage.TaskManager.Task] (Task='48a1e000-9eff-48a0-9bf5 -9152978854a2') moving from state preparing -> state preparing (task:602) 2018-08-22 10:38:28,983+0300 INFO (jsonrpc/1) [vdsm.api] START deleteVolume(sdUUID='affb03c6-52c8-4355- be7d-aa8d91452601', spUUID='493b429f-d032-476a-9dad-bb18a4ef4151', imgUUID='8e617cf0-0693-412e-bcc5-9eb5 fe2037cb', volumes=['91ceae38-9a63-45c3-9091-9cf2ba819e3b'], postZero='true', force='false', discard=Tru e) from=::ffff:10.35.163.251,41464, flow_id=disks_syncAction_77f49a65-13b4-465c, task_id=48a1e000-9eff-4 8a0-9bf5-9152978854a2 (api:46) and still in the logs at 2018-08-22 11:10:45. 2018-08-22 11:10:45,270+0300 DEBUG (jsonrpc/3) [storage.TaskManager.Task] (Task='3e035f0d-d425-4dfb-ba1f -d9fefede9b4a') finished: {'allTasksStatus': {'48a1e000-9eff-48a0-9bf5-9152978854a2': {'code': 0, 'messa ge': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '48a1e 000-9eff-48a0-9bf5-9152978854a2'}}} (task:1201) Additional info: When I tried to deactivate this host (SPM), I got this error in the engine log (make sense): 2018-08-22 11:10:45,291+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-36) [49f1eb8e-2a86-4371-bf85-c64974dc5a69] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_3', pool id '493b429f-d032-476a-9dad-bb18a4ef4151' as there are uncleared tasks 'Task '48a1e000-9eff-48a0-9bf5-9152978854a2', status 'finished''
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Created attachment 1477831 [details] logs
Yosi, does this happen often? Were there runs in which task was cleared successfully? Looks like the culprit is this NPE: 2018-08-22 10:38:44,139+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-198283) [] [within thread]: endAction for action type CreateSnapshot threw an exception.: java.lang.NullPointerException at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.endAction(CoCoAsyncTaskHelper.java:351) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.endAction(CommandCoordinatorImpl.java:347) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.endCommandAction(CommandAsyncTask.java:160) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandAsyncTask.lambda$endActionIfNecessary$0(CommandAsyncTask.java:112) [bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) [utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_181] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_181] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)
(In reply to Benny Zlotnik from comment #8) > Yosi, does this happen often? Were there runs in which task was cleared > successfully? It happens in most of the cases. I didn't see this test case execution which it didn't occur. > > Looks like the culprit is this NPE: > 2018-08-22 10:38:44,139+03 ERROR > [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] > (EE-ManagedThreadFactory-engine-Thread-198283) [] [within thread]: endAction > for action type CreateSnapshot threw an exception.: > java.lang.NullPointerException > at > org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper. > endAction(CoCoAsyncTaskHelper.java:351) [bll.jar:] > at > org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl. > endAction(CommandCoordinatorImpl.java:347) [bll.jar:] > at > org.ovirt.engine.core.bll.tasks.CommandAsyncTask. > endCommandAction(CommandAsyncTask.java:160) [bll.jar:] > at > org.ovirt.engine.core.bll.tasks.CommandAsyncTask. > lambda$endActionIfNecessary$0(CommandAsyncTask.java:112) [bll.jar:] > at > org.ovirt.engine.core.utils.threadpool. > ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) > [utils.jar:] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [rt.jar:1.8.0_181] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [rt.jar:1.8.0_181] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: > 1149) [rt.jar:1.8.0_181] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: > 624) [rt.jar:1.8.0_181] > at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_181] > at > org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread. > run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] > at > org.jboss.as.ee.concurrent.service. > ElytronManagedThreadFactory$ElytronManagedThread. > run(ElytronManagedThreadFactory.java:78)
deleteImage task is being cleared when LSM fails due to VM power off. deleteImage task starts: 2018-10-10 17:34:32,992+0300 DEBUG (jsonrpc/6) [storage.TaskManager.Task] (Task='97a1bac7-b21f-49ed-98f2-687257b60f6f') moving from state preparing -> state preparing (task:602) 2018-10-10 17:34:32,993+0300 INFO (jsonrpc/6) [vdsm.api] START deleteImage(sdUUID=u'2d633c7e-e417-4b4d-8b30-e973f1d6d870', spUUID=u'fb339a72-b298-451b-9a13-6554de8ef545', imgUUID=u'edc9bd45-2014-45a8-ab8d-d7c8d83b2f2b', postZero=u'false', force=u'false', discard=False) from=::ffff:10.35.161.127,51832, flow_id=vms_delete_fe902a07-01e2-4ee9, task_id=97a1bac7-b21f-49ed-98f2-687257b60f6f (api:46) And cleared right after: 2018-10-10 17:34:39,476+0300 INFO (jsonrpc/1) [vdsm.api] START clearTask(taskID=u'97a1bac7-b21f-49ed-98f2-687257b60f6f', spUUID=None, options=None) from=::ffff:10.35.161.127,51820, flow_id=vms_delete_fe902a07-0 1e2-4ee9, task_id=22a4ce37-b806-4def-b7fe-4a44eb4f9f3a (api:46) 2018-10-10 17:34:39,476+0300 DEBUG (jsonrpc/1) [storage.TaskManager] Entry. taskID: 97a1bac7-b21f-49ed-98f2-687257b60f6f (taskManager:172) ============= Used: rhvm-4.2.7.2-0.1.el7ev.noarch vdsm-4.20.42-1.el7ev.x86_64
This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.