Created attachment 609728 [details] logs Description of problem: I had some tasks that were still running in vds during a vdsm restart. the tasks were cleared from the vds and I manually cleaned the async_task table. but a few hours later when my SPM recontended I found that all the tasks are still sent to vdsm for SPMStopTaskVDSCommand as part of SpmStart. we are still reading from cache and even after stopTask gets an error from vdsm we do not clear the cache. Version-Release number of selected component (if applicable): si16 How reproducible: 100% Steps to Reproduce: 1. create a task and restart vdsm 2. clear thetask on async task table 3. put spm in maintenance so that the second host will contend Actual results: we keep reading from the async cache and not refreshing it even though we get a failure from vdsm on stopTask Expected results: when we get an error that task is unknown from vdsm we should refresh the asyncTaskMananger's cache. Additional info: vdsm and backend logs 2012-09-04 18:17:03,602 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] FINISH, HSMStopTaskVDSCommand, log id: 28a2b4a 2012-09-04 18:17:03,602 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] FINISH, SPMStopTaskVDSCommand, log id: 184c4c10 2012-09-04 18:17:03,602 INFO [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-88) [a4ade3f] SPMAsyncTask::StopTask: Attempting to stop task 7db04903-19f4-4c77-a5a5-5d3c8d9a8e34 (Parent Command AddVmFromTemplate, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters). 2012-09-04 18:17:03,602 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] START, SPMStopTaskVDSCommand(storagePoolId = f570527f-004a-4cab-8bee-129fa589bec5, ignoreFailoverLimit = false, compatabilityVersion = null, taskId = 7db04903-19f4-4c77-a5a5-5d3c8d9a8e34), log id: 3d200893 2012-09-04 18:17:03,614 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] START, HSMStopTaskVDSCommand(vdsId = 8c289d3a-f4d7-11e1-8cda-001a4a169741, taskId=7db04903-19f4-4c77-a5a5-5d3c8d9a8e34), log id: 1a0451c 2012-09-04 18:17:03,651 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-88) [a4ade3f] Command org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 401 mMessage Task id unknown: ('7db04903-19f4-4c77-a5a5-5d3c8d9a8e34',) 2012-09-04 18:17:03,651 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-88) [a4ade3f] Vds: gold-vdsd 2012-09-04 18:17:03,651 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-88) [a4ade3f] Command HSMStopTaskVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed in vdscommand to HSMStopTaskVDS, error = Task id unknown: ('7db04903-19f4-4c77-a5a5-5d3c8d9a8e34',) 2012-09-04 18:17:03,651 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] FINISH, HSMStopTaskVDSCommand, log id: 1a0451c 2012-09-04 18:17:03,651 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] FINISH, SPMStopTaskVDSCommand, log id: 3d200893
i'm not sure we need to fix this. if you are manpulaitng the db, you should "know what you are doing". in this case, maybe restart engine.
Dafna, Did you ask someone whether this manual deletion is allowed ?
1. the customer will not remove the tasks - only support will 2. you can close the bug if you like but I think that just because I manually deleted the async_tasks table does not mean that the same thing cannot happen by accident in a customer environment and I am not sure why us not clearing cache in case of task status coming back as unknown from vds should be kept in the code.
Andrew, Miki, Please advise on the desired behaviour. Do we allow customers to clear tasks from DB directly? Anyway if the above procedure is not acceptable than one must have a different way to clear the tasks, or simply wait for the zombiTask cleanup in engine (5 hours). In case it is acceptable we can clear the task on this specific scenario
http://gerrit.ovirt.org/#/c/8161/
fixed in commit : 909da9e
verified on si20 map contained 1 task and once new spm started the cache was cleared: 2012-10-15 18:35:56,699 INFO [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-18) Setting new tasks map. The map contains now 1 tasks 2012-10-15 18:36:07,885 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetAllTasksInfoVDSCommand] (QuartzScheduler_Worker-28) -- SPMGetAllTasksInfoVDSCommand::ExecuteIrsBrokerCommand: Attempting on storage pool 11d18980-5c97-40ca-b 2012-10-15 18:36:56,699 INFO [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-90) Setting new tasks map. The map contains now 0 tasks
*** Bug 854527 has been marked as a duplicate of this bug. ***