Bug 854308 - engine: task manager cache is not cleared after getting error that task does not exist from vdsm on stopTask
engine: task manager cache is not cleared after getting error that task does ...
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Eli Mesika
Dafna Ron
: 761050 854527 (view as bug list)
Depends On:
  Show dependency treegraph
Reported: 2012-09-04 11:22 EDT by Dafna Ron
Modified: 2016-02-10 14:08 EST (History)
11 users (show)

See Also:
Fixed In Version: si20
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-12-04 15:04:22 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
logs (2.56 MB, application/x-gzip)
2012-09-04 11:22 EDT, Dafna Ron
no flags Details

  None (edit)
Description Dafna Ron 2012-09-04 11:22:45 EDT
Created attachment 609728 [details]

Description of problem:

I had some tasks that were still running in vds during a vdsm restart. 
the tasks were cleared from the vds and I manually cleaned the async_task table. 
but a few hours later when my SPM recontended I found that all the tasks are still sent to vdsm for SPMStopTaskVDSCommand as part of SpmStart. 
we are still reading from cache and even after stopTask gets an error from vdsm we do not clear the cache. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. create a task and restart vdsm
2. clear thetask on async task table
3. put spm in maintenance so that the second host will contend
Actual results:

we keep reading from the async cache and not refreshing it even though we get a failure from vdsm on stopTask

Expected results:

when we get an error that task is unknown from vdsm we should refresh the asyncTaskMananger's cache. 

Additional info: vdsm and backend logs

2012-09-04 18:17:03,602 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] FINISH, HSMStopTaskVDSCommand, log id: 28a2b4a
2012-09-04 18:17:03,602 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] FINISH, SPMStopTaskVDSCommand, log id: 184c4c10
2012-09-04 18:17:03,602 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-88) [a4ade3f] SPMAsyncTask::StopTask: Attempting to stop task 7db04903-19f4-4c77-a5a5-5d3c8d9a8e34 (Parent Command AddVmFromTemplate, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters).
2012-09-04 18:17:03,602 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] START, SPMStopTaskVDSCommand(storagePoolId = f570527f-004a-4cab-8bee-129fa589bec5, ignoreFailoverLimit = false, compatabilityVersion = null, taskId = 7db04903-19f4-4c77-a5a5-5d3c8d9a8e34), log id: 3d200893
2012-09-04 18:17:03,614 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] START, HSMStopTaskVDSCommand(vdsId = 8c289d3a-f4d7-11e1-8cda-001a4a169741, taskId=7db04903-19f4-4c77-a5a5-5d3c8d9a8e34), log id: 1a0451c
2012-09-04 18:17:03,651 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-88) [a4ade3f] Command org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         401
mMessage                      Task id unknown: ('7db04903-19f4-4c77-a5a5-5d3c8d9a8e34',)

2012-09-04 18:17:03,651 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-88) [a4ade3f] Vds: gold-vdsd
2012-09-04 18:17:03,651 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-88) [a4ade3f] Command HSMStopTaskVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed in vdscommand to HSMStopTaskVDS, error = Task id unknown: ('7db04903-19f4-4c77-a5a5-5d3c8d9a8e34',)
2012-09-04 18:17:03,651 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] FINISH, HSMStopTaskVDSCommand, log id: 1a0451c
2012-09-04 18:17:03,651 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] FINISH, SPMStopTaskVDSCommand, log id: 3d200893
Comment 1 Itamar Heim 2012-09-04 14:49:59 EDT
i'm not sure we need to fix this.
if you are manpulaitng the db, you should "know what you are doing".
in this case, maybe restart engine.
Comment 2 Barak 2012-09-05 08:12:14 EDT

Did you ask someone whether this manual deletion is allowed ?
Comment 3 Dafna Ron 2012-09-05 08:26:48 EDT
1. the customer will not remove the tasks - only support will 
2. you can close the bug if you like but I think that just because I manually deleted the async_tasks table does not mean that the same thing cannot happen by accident in a customer environment and I am not sure why us not clearing cache in case of task status coming back as unknown from vds should be kept in the code.
Comment 4 Barak 2012-09-09 07:36:48 EDT
Andrew, Miki,

Please advise on the desired behaviour.
Do we allow customers to clear tasks from DB directly?

Anyway if the above procedure is not acceptable than one must have a different way to clear the tasks, or simply wait for the zombiTask cleanup in engine (5 hours).

In case it is acceptable we can clear the task on this specific scenario
Comment 7 Eli Mesika 2012-09-24 10:09:21 EDT
Comment 8 Eli Mesika 2012-09-24 11:53:36 EDT
fixed in commit : 909da9e
Comment 12 Dafna Ron 2012-10-15 12:41:35 EDT
verified on si20
map contained 1 task and once new spm started the cache was cleared: 

2012-10-15 18:35:56,699 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-18) Setting new tasks map. The map contains now 1 tasks

2012-10-15 18:36:07,885 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetAllTasksInfoVDSCommand] (QuartzScheduler_Worker-28) -- SPMGetAllTasksInfoVDSCommand::ExecuteIrsBrokerCommand: Attempting on storage pool 11d18980-5c97-40ca-b

2012-10-15 18:36:56,699 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-90) Setting new tasks map. The map contains now 0 tasks
Comment 13 mkublin 2012-10-21 05:53:11 EDT
*** Bug 854527 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.