Bug 854308 - engine: task manager cache is not cleared after getting error that task does not exist from vdsm on stopTask
Summary: engine: task manager cache is not cleared after getting error that task does ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.1.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Eli Mesika
QA Contact: Dafna Ron
URL:
Whiteboard: infra
: 761050 854527 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-04 15:22 UTC by Dafna Ron
Modified: 2016-02-10 19:08 UTC (History)
11 users (show)

Fixed In Version: si20
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-12-04 20:04:22 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (2.56 MB, application/x-gzip)
2012-09-04 15:22 UTC, Dafna Ron
no flags Details

Description Dafna Ron 2012-09-04 15:22:45 UTC
Created attachment 609728 [details]
logs

Description of problem:

I had some tasks that were still running in vds during a vdsm restart. 
the tasks were cleared from the vds and I manually cleaned the async_task table. 
but a few hours later when my SPM recontended I found that all the tasks are still sent to vdsm for SPMStopTaskVDSCommand as part of SpmStart. 
we are still reading from cache and even after stopTask gets an error from vdsm we do not clear the cache. 

Version-Release number of selected component (if applicable):

si16

How reproducible:

100%

Steps to Reproduce:
1. create a task and restart vdsm
2. clear thetask on async task table
3. put spm in maintenance so that the second host will contend
  
Actual results:

we keep reading from the async cache and not refreshing it even though we get a failure from vdsm on stopTask

Expected results:

when we get an error that task is unknown from vdsm we should refresh the asyncTaskMananger's cache. 


Additional info: vdsm and backend logs


2012-09-04 18:17:03,602 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] FINISH, HSMStopTaskVDSCommand, log id: 28a2b4a
2012-09-04 18:17:03,602 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] FINISH, SPMStopTaskVDSCommand, log id: 184c4c10
2012-09-04 18:17:03,602 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-88) [a4ade3f] SPMAsyncTask::StopTask: Attempting to stop task 7db04903-19f4-4c77-a5a5-5d3c8d9a8e34 (Parent Command AddVmFromTemplate, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters).
2012-09-04 18:17:03,602 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] START, SPMStopTaskVDSCommand(storagePoolId = f570527f-004a-4cab-8bee-129fa589bec5, ignoreFailoverLimit = false, compatabilityVersion = null, taskId = 7db04903-19f4-4c77-a5a5-5d3c8d9a8e34), log id: 3d200893
2012-09-04 18:17:03,614 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] START, HSMStopTaskVDSCommand(vdsId = 8c289d3a-f4d7-11e1-8cda-001a4a169741, taskId=7db04903-19f4-4c77-a5a5-5d3c8d9a8e34), log id: 1a0451c
2012-09-04 18:17:03,651 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-88) [a4ade3f] Command org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         401
mMessage                      Task id unknown: ('7db04903-19f4-4c77-a5a5-5d3c8d9a8e34',)


2012-09-04 18:17:03,651 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-88) [a4ade3f] Vds: gold-vdsd
2012-09-04 18:17:03,651 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-88) [a4ade3f] Command HSMStopTaskVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed in vdscommand to HSMStopTaskVDS, error = Task id unknown: ('7db04903-19f4-4c77-a5a5-5d3c8d9a8e34',)
2012-09-04 18:17:03,651 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] FINISH, HSMStopTaskVDSCommand, log id: 1a0451c
2012-09-04 18:17:03,651 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMStopTaskVDSCommand] (QuartzScheduler_Worker-88) [a4ade3f] FINISH, SPMStopTaskVDSCommand, log id: 3d200893

Comment 1 Itamar Heim 2012-09-04 18:49:59 UTC
i'm not sure we need to fix this.
if you are manpulaitng the db, you should "know what you are doing".
in this case, maybe restart engine.

Comment 2 Barak 2012-09-05 12:12:14 UTC
Dafna,

Did you ask someone whether this manual deletion is allowed ?

Comment 3 Dafna Ron 2012-09-05 12:26:48 UTC
1. the customer will not remove the tasks - only support will 
2. you can close the bug if you like but I think that just because I manually deleted the async_tasks table does not mean that the same thing cannot happen by accident in a customer environment and I am not sure why us not clearing cache in case of task status coming back as unknown from vds should be kept in the code.

Comment 4 Barak 2012-09-09 11:36:48 UTC
Andrew, Miki,

Please advise on the desired behaviour.
Do we allow customers to clear tasks from DB directly?

Anyway if the above procedure is not acceptable than one must have a different way to clear the tasks, or simply wait for the zombiTask cleanup in engine (5 hours).

In case it is acceptable we can clear the task on this specific scenario

Comment 7 Eli Mesika 2012-09-24 14:09:21 UTC
http://gerrit.ovirt.org/#/c/8161/

Comment 8 Eli Mesika 2012-09-24 15:53:36 UTC
fixed in commit : 909da9e

Comment 12 Dafna Ron 2012-10-15 16:41:35 UTC
verified on si20
map contained 1 task and once new spm started the cache was cleared: 

2012-10-15 18:35:56,699 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-18) Setting new tasks map. The map contains now 1 tasks

2012-10-15 18:36:07,885 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetAllTasksInfoVDSCommand] (QuartzScheduler_Worker-28) -- SPMGetAllTasksInfoVDSCommand::ExecuteIrsBrokerCommand: Attempting on storage pool 11d18980-5c97-40ca-b

2012-10-15 18:36:56,699 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-90) Setting new tasks map. The map contains now 0 tasks

Comment 13 mkublin 2012-10-21 09:53:11 UTC
*** Bug 854527 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.