Bug 1297190 - Create a VM from template and restart the engine once the VM is locked (before any tasks are running) might cause the VM to stay in lock status for ever
Summary: Create a VM from template and restart the engine once the VM is locked (befor...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 3.6.2.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-3.6.3
: 3.6.3.1
Assignee: Maor
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On: 1300757
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-10 12:03 UTC by Maor
Modified: 2016-03-10 12:48 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-10 12:48:35 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-3.6.z+
rule-engine: exception+
ylavi: planning_ack+
tnisan: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
Logs01 (819.58 KB, application/x-gzip)
2016-01-21 12:54 UTC, Aharon Canan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 51592 0 master MERGED core: Add force compensation when creating a VM from Template 2020-08-19 03:10:48 UTC
oVirt gerrit 51621 0 ovirt-engine-3.6 MERGED core: Add force compensation when creating a VM from Template 2020-08-19 03:10:48 UTC
oVirt gerrit 51652 0 ovirt-engine-3.6.2 MERGED core: Add force compensation when creating a VM from Template 2020-08-19 03:10:48 UTC

Description Maor 2016-01-10 12:03:14 UTC
Description of problem:
Create a VM from Template and restart the engine while the VM gets created might cause the VM to stay in lock status forever

Version-Release number of selected component (if applicable):
engine 4.0

How reproducible:
40% (2/5 restarts)

Steps to Reproduce:
1. Add a Template with disks
2. Create a VM from this Template
3. Restart the engine which the VM gets created

Actual results:
The VM stay in lock status for ever

Expected results:
The VM should be removed from the DB

Additional info:
Looks like the compensation is not forced for AddVmTemplate command.
so when starting the engine the VM stays in the DB with lock status.
This is part of BZ1288852 verification scenarios although BZ1288852 indicates the VM was removed but it might also stay in lock status.

Comment 1 Aharon Canan 2016-01-21 12:54:33 UTC
Created attachment 1116924 [details]
Logs01

Reproduced 
==========
rhevm-3.6.2.6-0.1.el6.noarch
vdsm-4.17.18-0.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.2.x86_64

Steps to Reproduce:
===================
1. Add a Template with disks
2. Create a VM from this Template
3. Restart the engine which the VM gets created

Comment 2 Maor 2016-01-21 13:03:40 UTC
What is the use case? which disks were the VM created from? Can you please add a print screen of the locked VM

Comment 3 Aharon Canan 2016-01-21 13:20:29 UTC
The setup is available for you to check (already provided you the details offline)

Comment 4 Maor 2016-01-21 15:41:57 UTC
The fix for this bug is related to the compensation when the engine gets restarted at the middle of the execute phase before starting any tasks.

It looks like you encountered another bug, which related to the async task manager in the infrastructure.

From the logs it looks like the reboot which was done was at the point which the task was already finished and was prepared to cleaned.

Engine:
Adding the task and the command:
2016-01-21 09:30:26,715 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (ajp-/127.0.0.1:8702-6) [4093f7f3] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command 'bb60b530-7060-4080-b9fa-5e06d3ab23a1'
...
2016-01-21 09:30:26,922 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (ajp-/127.0.0.1:8702-6) [4093f7f3] Adding task '80599970-da31-43c7-ae2d-533b651c8b21' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..

Polling the task:
 SPMAsyncTask::PollTask: Polling task '80599970-da31-43c7-ae2d-533b651c8b21' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.

Now the task has finished with success and the engine trying to remove it:
 BaseAsyncTask::onTaskEndSuccess: Task '80599970-da31-43c7-ae2d-533b651c8b21' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.

At this point the engine was restarted, just before the task has changed its status in the DB:
2016-01-21 09:30:31,463 ERROR [org.ovirt.engine.core.dal.dbbroker.DbFacade] (DefaultQuartzScheduler_Worker-64) [] Can't find dao for interface org.ovirt.engine.core.dao.CommandEntityDao

Once the engine was restarted the engine does not find the right constructor:
2016-01-21 09:32:01,648 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (org.ovirt.thread.pool-7-thread-6) [563a4d4] Can't find constructor for type org.ovirt.engine.core.bll.CreateSnapshotFromTemplateCommand with parameter types: [class org.ovirt.engine.core.common.action.CreateSnapshotFromTemplateParameters]

and that is why the task doesn't gets cleared.

DB:

engine=# SELECT * FROM async_tasks;
               task_id                | action_type | status | result |               step_id                |              command_id              |        started_at         |           st
orage_pool_id            | task_type |             vdsm_task_id             |           root_command_id            |               user_id                
--------------------------------------+-------------+--------+--------+--------------------------------------+--------------------------------------+---------------------------+-------------
-------------------------+-----------+--------------------------------------+--------------------------------------+--------------------------------------
 ec7b9e9e-ef02-4988-929d-b599c8d6a9ed |         208 |      2 |      0 | abd7f4ec-6ff0-4dc8-88af-8fe720a43263 | bb60b530-7060-4080-b9fa-5e06d3ab23a1 | 2016-01-21 09:30:25.47+02 | 00000001-000
1-0001-0001-0000000000a8 |         3 | 80599970-da31-43c7-ae2d-533b651c8b21 | bb60b530-7060-4080-b9fa-5e06d3ab23a1 | 00000019-0019-0019-0019-0000000001f4
(1 row)


engine=# SELECT command_id, command_type, root_command_id, status, parent_command_id, job_id FROM command_entities where command_type = 208;
              command_id              | command_type |           root_command_id            | status |          parent_command_id           |                job_id                
--------------------------------------+--------------+--------------------------------------+--------+--------------------------------------+--------------------------------------
 bb60b530-7060-4080-b9fa-5e06d3ab23a1 |          208 | f197726c-e378-443f-acb8-af9006de3481 | ACTIVE | f197726c-e378-443f-acb8-af9006de3481 | 88696c0c-31ac-4793-b7e0-2dbc137b0ea7


Host:

[root@camel-vdsc ~]# vdsClient -s 0 getAllTasksStatuses 
{'status': {'message': 'OK', 'code': 0}, 'allTasksStatus': {'80599970-da31-43c7-ae2d-533b651c8b21': {'message': '1 jobs completed successfully', 'code': 0, 'taskID': '80599970-da31-43c7-ae2d-533b651c8b21', 'taskResult': 'success', 'taskState': 'finished'}}}


I will open a new bug which is referenced to this scenario and use this bug as dependent on it.

Comment 5 Red Hat Bugzilla Rules Engine 2016-01-21 23:05:04 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 6 Maor 2016-01-25 13:46:32 UTC
Basically this specific bug is a failure of failed scenario when engine gets restarted and this issue has always been there though it was never reproduced.
patch 51652 already fixed this issue though BZ1300757 currently blocks it from being verified.
Moving this to high to be correlated with BZ1300757 which it depends on.

Comment 7 Tal Nisan 2016-02-08 12:32:58 UTC
Maor, I see that all patches are merged and the dependent bug is MODIFIED, should this bug moved to MODIFIED as well?

Comment 8 Tal Nisan 2016-02-08 22:49:23 UTC
Maor, the patch backports go down to 3.6.2 in the 11th of January, are those the only patches or was there something missing that prevented you to move this bug to MODIFIED?
Cause if nothing else it means that this bug is already fixed in 3.6.2 and can be tested in this version instead of waiting for 3.6.5

Comment 9 Maor 2016-02-09 08:53:11 UTC
The bug is about a scenario that VM is locked due to compensation issue (before any tasks were running).
The patches were already fixing it in 3.6.2, and basically it should not be failed at all.
but since there was another bug that a VM stayed in lock status because of running tasks (a different issue) this was re-opened by mistake.
so I opened a new bug and made this one defendant on it.
The verification for this bug (only compensation without running tasks) can be verified in 3.6.2 although it might be a difficult one.

Comment 10 Maor 2016-02-09 08:54:20 UTC
I suggest, to make the verification more easy, to verify both of the bugs on 3.6.5

Comment 11 Yaniv Lavi 2016-02-21 13:32:34 UTC
Bug 1300757 was moved to ON_QA, moving this one as well.

Comment 12 Kevin Alon Goldblatt 2016-02-23 17:23:39 UTC
Verified with the following code:
---------------------------------------------
rhevm-3.6.3.2-0.1.el6.noarch
vdsm-4.17.21-0.el7ev.noarch

Verified with the following scenario:
---------------------------------------------
Steps to Reproduce:
===================
1. Add a Template with disks
2. Create a VM from this Template
3. Restart the engine during the VM creation operation
a) When restarting the engine immediately during the disk creation, before all the disks have been created >>>>> as expected the VM is not created
b) When restarting the engine after the VM disks creation has completed but while the VM is still locked >>>> The VM is completed successfully

Moving to VERIFIED!


Note You need to log in before you can comment on or make changes to this bug.