Bug 971779

Summary: Async tasks sometimes stay until restart of ovirt-engine service
Product: Red Hat Enterprise Virtualization Manager Reporter: Katarzyna Jachim <kjachim>
Component: ovirt-engineAssignee: Nobody's working on this, feel free to take it <nobody>
Status: CLOSED NOTABUG QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: acathrow, amureini, bazulay, dyasny, iheim, kjachim, lpeer, ncredi, Rhev-m-bugs, yeylon, ykaul, yzaslavs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-06-15 17:13:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm.log + engine.log + boot.log + server.log none

Description Katarzyna Jachim 2013-06-07 09:16:19 UTC
Created attachment 758070 [details]
vdsm.log + engine.log + boot.log + server.log

Description of problem:

I played with importing and exporting VMs (create a VM, export it, remove the original one, import again, remove it, remove exported, remove export domain). After that I tried to remove the whole DC, but apparently there were not finished async tasks, so I couldn't put into maintenance neither master storage domain, nor SPM host. I left it in this state for the whole night, hoping that eventually the task would finish (I suspected it to be formatting removed export domain), but it was the same on the next day. Then I checked if there was any task in RHEVM database - there was none (async_tasks table was empty). I also examined the logs but I couldn't find any information about not finished tasks. Then I just restarted ovirt-engine service (service ovirt-engine restart). After that I got the error messages about not finished async tasks again, but after a few minutes the tasks finished and I was able to remove the DC.

As you can see in the attached logs, the task was not reported by engine since 2013-06-06 16:00:38,166 until 2013-06-07 10:17:47,991, ovirt-engine was restarted at 10:17:35,816

Snippet from the engine.log (grep '368b5a3e-52cd-4c9c-9985-24d28316d6d9' engine.log):
2013-06-06 15:56:11,975 INFO  [org.ovirt.engine.core.bll.EntityMultiAsyncTasks] (ajp-/127.0.0.1:8702-10) [5558b718] EntityMultiAsyncTasks::AttachTask: Attaching task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 to entity 24541ce3-673c-4000-940f-dedbabacb60e.
2013-06-06 15:56:12,000 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (ajp-/127.0.0.1:8702-10) [5558b718] Adding task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters), polling hasn't started yet..
2013-06-06 15:56:12,103 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (ajp-/127.0.0.1:8702-10) [5558b718] BaseAsyncTask::StartPollingTask: Starting to poll task 368b5a3e-52cd-4c9c-9985-24d28316d6d9.
2013-06-06 15:56:20,066 INFO  [org.ovirt.engine.core.bll.EntityMultiAsyncTasks] (pool-4-thread-48) EntityMultiAsyncTasks::StartPollingNextTask: Starting to poll next task (task ID: 368b5a3e-52cd-4c9c-9985-24d28316d6d9, action type ImportVm)
2013-06-06 15:56:20,066 INFO  [org.ovirt.engine.core.bll.EntityMultiAsyncTasks] (pool-4-thread-48) EntityMultiAsyncTasks::StartPollingTask: Current Action Type for entity 24541ce3-673c-4000-940f-dedbabacb60e is ImportVm (determined by task 368b5a3e-52cd-4c9c-9985-24d28316d6d9)
2013-06-06 15:56:27,150 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-74) SPMAsyncTask::PollTask: Polling task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status running.
2013-06-06 15:57:17,301 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-82) SPMAsyncTask::PollTask: Polling task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'success'.
2013-06-06 15:57:17,305 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-82) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:57:27,339 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-91) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:57:37,374 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-92) [201b8278] BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:57:47,411 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-31) [5c289a97] BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:57:57,447 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-83) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:58:07,492 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-68) [6f84067c] BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:58:17,526 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-36) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:58:27,560 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-56) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:58:37,594 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-85) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:58:47,626 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-3) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:58:57,661 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-32) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:59:07,704 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-52) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:59:17,736 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-79) [41e4f10b] BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:59:27,769 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-58) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:59:37,802 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-20) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 15:59:47,836 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-40) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 16:00:37,893 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-62) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command ImportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-06 16:00:38,125 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (pool-4-thread-47) SPMAsyncTask::ClearAsyncTask: Attempting to clear task 368b5a3e-52cd-4c9c-9985-24d28316d6d9
2013-06-06 16:00:38,125 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (pool-4-thread-47) START, SPMClearTaskVDSCommand( storagePoolId = 69e98746-c9f3-48e7-be15-46bc33e32147, ignoreFailoverLimit = false, compatabilityVersion = null, taskId = 368b5a3e-52cd-4c9c-9985-24d28316d6d9), log id: 22ce8a41
2013-06-06 16:00:38,126 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (pool-4-thread-47) START, HSMClearTaskVDSCommand(HostName = 10.34.63.216, HostId = 3ce01bc9-ff5a-4b1e-a398-ffa19b1bccae, taskId=368b5a3e-52cd-4c9c-9985-24d28316d6d9), log id: 743c6832
2013-06-06 16:00:38,163 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (pool-4-thread-47) Trying to remove unknown task: 368b5a3e-52cd-4c9c-9985-24d28316d6d9
2013-06-06 16:00:38,166 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (pool-4-thread-47) BaseAsyncTask::RemoveTaskFromDB: Removed task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 from DataBase
2013-06-07 10:17:47,991 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-9) Adding task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command Unknown, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters), polling hasn't started yet..
2013-06-07 10:17:48,182 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-9) BaseAsyncTask::StartPollingTask: Starting to poll task 368b5a3e-52cd-4c9c-9985-24d28316d6d9.
2013-06-07 10:17:56,951 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-14) task id 368b5a3e-52cd-4c9c-9985-24d28316d6d9 is in pre-polling  period and should not be polled. Pre-polling period is 60,000 millis. 
2013-06-07 10:18:06,953 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-24) task id 368b5a3e-52cd-4c9c-9985-24d28316d6d9 is in pre-polling  period and should not be polled. Pre-polling period is 60,000 millis. 
2013-06-07 10:18:16,955 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-32) task id 368b5a3e-52cd-4c9c-9985-24d28316d6d9 is in pre-polling  period and should not be polled. Pre-polling period is 60,000 millis. 
2013-06-07 10:18:26,957 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-40) task id 368b5a3e-52cd-4c9c-9985-24d28316d6d9 is in pre-polling  period and should not be polled. Pre-polling period is 60,000 millis. 
2013-06-07 10:18:36,958 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-52) task id 368b5a3e-52cd-4c9c-9985-24d28316d6d9 is in pre-polling  period and should not be polled. Pre-polling period is 60,000 millis. 
2013-06-07 10:18:46,960 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-60) task id 368b5a3e-52cd-4c9c-9985-24d28316d6d9 is in pre-polling  period and should not be polled. Pre-polling period is 60,000 millis. 
2013-06-07 10:18:56,962 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-68) task id 368b5a3e-52cd-4c9c-9985-24d28316d6d9 has passed pre-polling period time and should be polled. Pre-polling period is 60,000 millis. 
2013-06-07 10:18:56,962 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-68) task id 368b5a3e-52cd-4c9c-9985-24d28316d6d9 has passed pre-polling period time and should be polled. Pre-polling period is 60,000 millis. 
2013-06-07 10:18:56,963 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-68) task id 368b5a3e-52cd-4c9c-9985-24d28316d6d9 has passed pre-polling period time and should be polled. Pre-polling period is 60,000 millis. 
2013-06-07 10:18:57,001 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-68) task id 368b5a3e-52cd-4c9c-9985-24d28316d6d9 has passed pre-polling period time and should be polled. Pre-polling period is 60,000 millis. 
2013-06-07 10:18:57,001 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-68) SPMAsyncTask::PollTask: Polling task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command Unknown, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'success'.
2013-06-07 10:18:57,002 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-68) BaseAsyncTask::OnTaskEndSuccess: Task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 (Parent Command Unknown, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended successfully.
2013-06-07 10:18:57,002 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-68) SPMAsyncTask::ClearAsyncTask: Attempting to clear task 368b5a3e-52cd-4c9c-9985-24d28316d6d9
2013-06-07 10:18:57,004 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (QuartzScheduler_Worker-68) START, SPMClearTaskVDSCommand( storagePoolId = 69e98746-c9f3-48e7-be15-46bc33e32147, ignoreFailoverLimit = false, compatabilityVersion = null, taskId = 368b5a3e-52cd-4c9c-9985-24d28316d6d9), log id: 3330322e
2013-06-07 10:18:57,007 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (QuartzScheduler_Worker-68) START, HSMClearTaskVDSCommand(HostName = 10.34.63.216, HostId = 3ce01bc9-ff5a-4b1e-a398-ffa19b1bccae, taskId=368b5a3e-52cd-4c9c-9985-24d28316d6d9), log id: 89de3f6
2013-06-07 10:18:57,094 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-68) BaseAsyncTask::RemoveTaskFromDB: Removed task 368b5a3e-52cd-4c9c-9985-24d28316d6d9 from DataBase


Version-Release number of selected component (if applicable): sf17.3


How reproducible: happened once


Steps to Reproduce:
1. create a VM
2. export it
3. remove the original VM
4. import VM
5. remove imported VM
6. remove exported VM from export storage domain
7. remove export storage domain (with formatting)
8. try to put into maintenance the master storage domain

Actual results:
deactivating master storage domain failed because of not finished async tasks, even though there was apparently no such tasks

Expected results:
deactivating master storage domain should pass

Additional info:

Comment 1 Barak 2013-06-09 09:07:15 UTC
This looks like a storage issue.

The table is empty but the canDoAction still fails.

Allon can you please take a look ?

Comment 2 Allon Mureinik 2013-06-09 14:50:28 UTC
Barak, I don't quite understand what I'm looking for.
No bll code in oVirt Engine accesses the database directly, only via DAOs.

The condition to fail this CDA is as follows (see DeactivateStorageDomainCommand, circa line 125):
getAsyncTaskDao().getAsyncTaskIdsByStoragePoolId(getStorageDomain().getStoragePoolId().getValue()).size() > 0

This DAO call is implemented in quite a trivial way:
RETURN QUERY SELECT async_tasks.task_id
FROM  async_tasks
WHERE storage_pool_id = v_storage_pool_id;

(Note: There is no other patch that produces this CDA).

If the code shown above is wrong - please advise how to fix it. 
If not, it seems like the AsyncTaskDao is acting up.

Comment 4 Yair Zaslavsky 2013-06-10 08:29:36 UTC
I can see in the log several places where the canDoAction fails on

ERROR_CANNOT_DEACTIVATE_MASTER_DOMAIN_WITH_TASKS_ON_POOL

Comment 5 Yair Zaslavsky 2013-06-10 08:33:47 UTC
Katarzyna,
When you're saying you want to remove the data center, you mean you're deactivating the storage domains first, right? (or at least, trying to).

Comment 6 Katarzyna Jachim 2013-06-10 08:49:13 UTC
Yes, I tried to deactive the storage domain.

Versions: 
rhevm-3.2.0-11.29.el6ev.noarch (sf17.3)
vdsm-4.10.2-13.0.el6ev.x86_64 <- really old one, it is my local setup and I upgrade it rather rarely

Comment 7 Yair Zaslavsky 2013-06-13 12:39:09 UTC
Can you please advise on the specific dates in the log , when you say "the next day" ? it is composed from several days, and I saw several restarts.
In addition, did you take a look at the db after the restart before calling deactivate storage domain, which I assume returned you that you have tasks running on the domain, so the action failed (am I correct about the error code?)

Comment 8 Katarzyna Jachim 2013-06-13 12:44:59 UTC
Task started on 2013-06-06, "the next day" is 2013-06-07 (ovirt-engine was restarted at 10:17 - see description). No, I haven't checked the db after the restart.

Comment 9 Yair Zaslavsky 2013-06-15 17:13:05 UTC
The task you mentioned has ended (see vdsm.log )around 2013-06-07 10:06

Thread-39185::DEBUG::2013-06-07 10:06:58,988::taskManager::93::TaskManager::(getTaskStatus) Entry. taskID: 368b5a3e-52cd-4c9c-9985-24d28316d6d9
Thread-39185::DEBUG::2013-06-07 10:06:58,988::taskManager::96::TaskManager::(getTaskStatus) Return. Response: {'code': 0, 'message': u'1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '368b5a3e-52cd-4c9c-9985-24d28316d6d9'}

So you can see the task was removed from db after that.
IMHO, this is not a bug at the async tasks mechanism.