Bug 1300757 - Create a VM from Template and restart the engine while the tasks are running might cause the VM to stay in lock status for ever
Create a VM from Template and restart the engine while the tasks are running ...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage (Show other bugs)
3.6.2
Unspecified Unspecified
unspecified Severity high (vote)
: ovirt-3.6.3
: 3.6.3.1
Assigned To: Liron Aravot
Elad
:
Depends On:
Blocks: 1297190
  Show dependency treegraph
 
Reported: 2016-01-21 11:07 EST by Maor
Modified: 2016-03-10 07:47 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-10 07:47:49 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
ylavi: planning_ack+
amureini: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 52573 None None None 2016-01-24 05:05 EST
oVirt gerrit 52724 ovirt-engine-3.6 MERGED core: adding missing ctors 2016-01-26 04:50 EST
oVirt gerrit 52842 master MERGED core: CreateSnapshot - add parameters ctor 2016-01-28 05:12 EST
oVirt gerrit 52849 ovirt-engine-3.6 MERGED core: CreateSnapshot - add parameters ctor 2016-01-28 06:14 EST
oVirt gerrit 52854 ovirt-engine-3.6.3 MERGED core: CreateSnapshot - add parameters ctor 2016-01-28 08:10 EST

  None (edit)
Description Maor 2016-01-21 11:07:06 EST
Description of problem:
Once an engine gets restarted at the moment the engine already executed tasks for copy disk, the VM gets locked forever.

Engine:
Adding the task and the command:
2016-01-21 09:30:26,715 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (ajp-/127.0.0.1:8702-6) [4093f7f3] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command 'bb60b530-7060-4080-b9fa-5e06d3ab23a1'
...
2016-01-21 09:30:26,922 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (ajp-/127.0.0.1:8702-6) [4093f7f3] Adding task '80599970-da31-43c7-ae2d-533b651c8b21' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..

Polling the task:
 SPMAsyncTask::PollTask: Polling task '80599970-da31-43c7-ae2d-533b651c8b21' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.

Now the task has finished with success and the engine trying to remove it:
 BaseAsyncTask::onTaskEndSuccess: Task '80599970-da31-43c7-ae2d-533b651c8b21' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.

At this point the engine was restarted, just before the task has changed its status in the DB:
2016-01-21 09:30:31,463 ERROR [org.ovirt.engine.core.dal.dbbroker.DbFacade] (DefaultQuartzScheduler_Worker-64) [] Can't find dao for interface org.ovirt.engine.core.dao.CommandEntityDao

Once the engine was restarted the engine does not find the right constructor:
2016-01-21 09:32:01,648 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (org.ovirt.thread.pool-7-thread-6) [563a4d4] Can't find constructor for type org.ovirt.engine.core.bll.CreateSnapshotFromTemplateCommand with parameter types: [class org.ovirt.engine.core.common.action.CreateSnapshotFromTemplateParameters]

and that is why the task doesn't gets cleared.

DB:

engine=# SELECT * FROM async_tasks;
               task_id                | action_type | status | result |               step_id                |              command_id              |        started_at         |           st
orage_pool_id            | task_type |             vdsm_task_id             |           root_command_id            |               user_id                
--------------------------------------+-------------+--------+--------+--------------------------------------+--------------------------------------+---------------------------+-------------
-------------------------+-----------+--------------------------------------+--------------------------------------+--------------------------------------
 ec7b9e9e-ef02-4988-929d-b599c8d6a9ed |         208 |      2 |      0 | abd7f4ec-6ff0-4dc8-88af-8fe720a43263 | bb60b530-7060-4080-b9fa-5e06d3ab23a1 | 2016-01-21 09:30:25.47+02 | 00000001-000
1-0001-0001-0000000000a8 |         3 | 80599970-da31-43c7-ae2d-533b651c8b21 | bb60b530-7060-4080-b9fa-5e06d3ab23a1 | 00000019-0019-0019-0019-0000000001f4
(1 row)


engine=# SELECT command_id, command_type, root_command_id, status, parent_command_id, job_id FROM command_entities where command_type = 208;
              command_id              | command_type |           root_command_id            | status |          parent_command_id           |                job_id                
--------------------------------------+--------------+--------------------------------------+--------+--------------------------------------+--------------------------------------
 bb60b530-7060-4080-b9fa-5e06d3ab23a1 |          208 | f197726c-e378-443f-acb8-af9006de3481 | ACTIVE | f197726c-e378-443f-acb8-af9006de3481 | 88696c0c-31ac-4793-b7e0-2dbc137b0ea7


Host:

[root@camel-vdsc ~]# vdsClient -s 0 getAllTasksStatuses 
{'status': {'message': 'OK', 'code': 0}, 'allTasksStatus': {'80599970-da31-43c7-ae2d-533b651c8b21': {'message': '1 jobs completed successfully', 'code': 0, 'taskID': '80599970-da31-43c7-ae2d-533b651c8b21', 'taskResult': 'success', 'taskState': 'finished'}}}

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a VM from Template with several images disks
2. Restart the engine once there are tasks running on 
3.

Actual results:
The VM is locked and tasks are still hanging in VDSM

Expected results:
The VM should be unlocked and the operation should be ended

Additional info:
Comment 1 Red Hat Bugzilla Rules Engine 2016-01-28 05:14:45 EST
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 2 Red Hat Bugzilla Rules Engine 2016-01-28 07:24:38 EST
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.
Comment 3 Red Hat Bugzilla Rules Engine 2016-01-28 07:28:29 EST
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.
Comment 4 Elad 2016-02-22 10:07:46 EST
Restarted ovirt-engine during template creation with multiple disk, while copyImage tasks running on SPM. Once engine came back, the operation was ended, the VM and the images got unlocked


2016-02-22 15:00:32,717 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (org.ovirt.thread.pool-6-thread-6) [1b361de4] Discovered 3 tasks on Storage Pool 'dc1', 3 added to manager.



2016-02-22 15:00:46,534 ERROR [org.ovirt.engine.core.bll.AddVmTemplateCommand] (org.ovirt.thread.pool-6-thread-12) [1b361de4] Ending command 'org.ovirt.engine.core.bll.AddVmTemplateCommand' with failure.



Checked using:
rhevm-3.6.3.2-0.1.el6.noarch
vdsm-4.17.21-0.el7ev.noarch

Note You need to log in before you can comment on or make changes to this bug.