Bug 1300757 - Create a VM from Template and restart the engine while the tasks are running might cause the VM to stay in lock status for ever
Summary: Create a VM from Template and restart the engine while the tasks are running ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 3.6.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-3.6.3
: 3.6.3.1
Assignee: Liron Aravot
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks: 1297190
TreeView+ depends on / blocked
 
Reported: 2016-01-21 16:07 UTC by Maor
Modified: 2016-03-10 12:47 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-10 12:47:49 UTC
oVirt Team: Storage
rule-engine: ovirt-3.6.z+
ylavi: planning_ack+
amureini: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 52573 None None None 2016-01-24 10:05:08 UTC
oVirt gerrit 52724 ovirt-engine-3.6 MERGED core: adding missing ctors 2016-01-26 09:50:58 UTC
oVirt gerrit 52842 master MERGED core: CreateSnapshot - add parameters ctor 2016-01-28 10:12:00 UTC
oVirt gerrit 52849 ovirt-engine-3.6 MERGED core: CreateSnapshot - add parameters ctor 2016-01-28 11:14:44 UTC
oVirt gerrit 52854 ovirt-engine-3.6.3 MERGED core: CreateSnapshot - add parameters ctor 2016-01-28 13:10:29 UTC

Description Maor 2016-01-21 16:07:06 UTC
Description of problem:
Once an engine gets restarted at the moment the engine already executed tasks for copy disk, the VM gets locked forever.

Engine:
Adding the task and the command:
2016-01-21 09:30:26,715 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (ajp-/127.0.0.1:8702-6) [4093f7f3] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command 'bb60b530-7060-4080-b9fa-5e06d3ab23a1'
...
2016-01-21 09:30:26,922 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (ajp-/127.0.0.1:8702-6) [4093f7f3] Adding task '80599970-da31-43c7-ae2d-533b651c8b21' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..

Polling the task:
 SPMAsyncTask::PollTask: Polling task '80599970-da31-43c7-ae2d-533b651c8b21' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.

Now the task has finished with success and the engine trying to remove it:
 BaseAsyncTask::onTaskEndSuccess: Task '80599970-da31-43c7-ae2d-533b651c8b21' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.

At this point the engine was restarted, just before the task has changed its status in the DB:
2016-01-21 09:30:31,463 ERROR [org.ovirt.engine.core.dal.dbbroker.DbFacade] (DefaultQuartzScheduler_Worker-64) [] Can't find dao for interface org.ovirt.engine.core.dao.CommandEntityDao

Once the engine was restarted the engine does not find the right constructor:
2016-01-21 09:32:01,648 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (org.ovirt.thread.pool-7-thread-6) [563a4d4] Can't find constructor for type org.ovirt.engine.core.bll.CreateSnapshotFromTemplateCommand with parameter types: [class org.ovirt.engine.core.common.action.CreateSnapshotFromTemplateParameters]

and that is why the task doesn't gets cleared.

DB:

engine=# SELECT * FROM async_tasks;
               task_id                | action_type | status | result |               step_id                |              command_id              |        started_at         |           st
orage_pool_id            | task_type |             vdsm_task_id             |           root_command_id            |               user_id                
--------------------------------------+-------------+--------+--------+--------------------------------------+--------------------------------------+---------------------------+-------------
-------------------------+-----------+--------------------------------------+--------------------------------------+--------------------------------------
 ec7b9e9e-ef02-4988-929d-b599c8d6a9ed |         208 |      2 |      0 | abd7f4ec-6ff0-4dc8-88af-8fe720a43263 | bb60b530-7060-4080-b9fa-5e06d3ab23a1 | 2016-01-21 09:30:25.47+02 | 00000001-000
1-0001-0001-0000000000a8 |         3 | 80599970-da31-43c7-ae2d-533b651c8b21 | bb60b530-7060-4080-b9fa-5e06d3ab23a1 | 00000019-0019-0019-0019-0000000001f4
(1 row)


engine=# SELECT command_id, command_type, root_command_id, status, parent_command_id, job_id FROM command_entities where command_type = 208;
              command_id              | command_type |           root_command_id            | status |          parent_command_id           |                job_id                
--------------------------------------+--------------+--------------------------------------+--------+--------------------------------------+--------------------------------------
 bb60b530-7060-4080-b9fa-5e06d3ab23a1 |          208 | f197726c-e378-443f-acb8-af9006de3481 | ACTIVE | f197726c-e378-443f-acb8-af9006de3481 | 88696c0c-31ac-4793-b7e0-2dbc137b0ea7


Host:

[root@camel-vdsc ~]# vdsClient -s 0 getAllTasksStatuses 
{'status': {'message': 'OK', 'code': 0}, 'allTasksStatus': {'80599970-da31-43c7-ae2d-533b651c8b21': {'message': '1 jobs completed successfully', 'code': 0, 'taskID': '80599970-da31-43c7-ae2d-533b651c8b21', 'taskResult': 'success', 'taskState': 'finished'}}}

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a VM from Template with several images disks
2. Restart the engine once there are tasks running on 
3.

Actual results:
The VM is locked and tasks are still hanging in VDSM

Expected results:
The VM should be unlocked and the operation should be ended

Additional info:

Comment 1 Red Hat Bugzilla Rules Engine 2016-01-28 10:14:45 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 2 Red Hat Bugzilla Rules Engine 2016-01-28 12:24:38 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 3 Red Hat Bugzilla Rules Engine 2016-01-28 12:28:29 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 4 Elad 2016-02-22 15:07:46 UTC
Restarted ovirt-engine during template creation with multiple disk, while copyImage tasks running on SPM. Once engine came back, the operation was ended, the VM and the images got unlocked


2016-02-22 15:00:32,717 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (org.ovirt.thread.pool-6-thread-6) [1b361de4] Discovered 3 tasks on Storage Pool 'dc1', 3 added to manager.



2016-02-22 15:00:46,534 ERROR [org.ovirt.engine.core.bll.AddVmTemplateCommand] (org.ovirt.thread.pool-6-thread-12) [1b361de4] Ending command 'org.ovirt.engine.core.bll.AddVmTemplateCommand' with failure.



Checked using:
rhevm-3.6.3.2-0.1.el6.noarch
vdsm-4.17.21-0.el7ev.noarch


Note You need to log in before you can comment on or make changes to this bug.