Bug 1297190 - Create a VM from template and restart the engine once the VM is locked (before any tasks are running) might cause the VM to stay in lock status for ever
Create a VM from template and restart the engine once the VM is locked (befor...
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core (Show other bugs)
Unspecified Unspecified
unspecified Severity high (vote)
: ovirt-3.6.3
Assigned To: Maor
Kevin Alon Goldblatt
Depends On: 1300757
  Show dependency treegraph
Reported: 2016-01-10 07:03 EST by Maor
Modified: 2016-03-10 07:48 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-03-10 07:48:35 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
rule-engine: exception+
ylavi: planning_ack+
tnisan: devel_ack+
rule-engine: testing_ack+

Attachments (Terms of Use)
Logs01 (819.58 KB, application/x-gzip)
2016-01-21 07:54 EST, Aharon Canan
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 51592 master MERGED core: Add force compensation when creating a VM from Template 2016-01-11 04:03 EST
oVirt gerrit 51621 ovirt-engine-3.6 MERGED core: Add force compensation when creating a VM from Template 2016-01-11 09:44 EST
oVirt gerrit 51652 ovirt-engine-3.6.2 MERGED core: Add force compensation when creating a VM from Template 2016-01-11 10:32 EST

  None (edit)
Description Maor 2016-01-10 07:03:14 EST
Description of problem:
Create a VM from Template and restart the engine while the VM gets created might cause the VM to stay in lock status forever

Version-Release number of selected component (if applicable):
engine 4.0

How reproducible:
40% (2/5 restarts)

Steps to Reproduce:
1. Add a Template with disks
2. Create a VM from this Template
3. Restart the engine which the VM gets created

Actual results:
The VM stay in lock status for ever

Expected results:
The VM should be removed from the DB

Additional info:
Looks like the compensation is not forced for AddVmTemplate command.
so when starting the engine the VM stays in the DB with lock status.
This is part of BZ1288852 verification scenarios although BZ1288852 indicates the VM was removed but it might also stay in lock status.
Comment 1 Aharon Canan 2016-01-21 07:54 EST
Created attachment 1116924 [details]


Steps to Reproduce:
1. Add a Template with disks
2. Create a VM from this Template
3. Restart the engine which the VM gets created
Comment 2 Maor 2016-01-21 08:03:40 EST
What is the use case? which disks were the VM created from? Can you please add a print screen of the locked VM
Comment 3 Aharon Canan 2016-01-21 08:20:29 EST
The setup is available for you to check (already provided you the details offline)
Comment 4 Maor 2016-01-21 10:41:57 EST
The fix for this bug is related to the compensation when the engine gets restarted at the middle of the execute phase before starting any tasks.

It looks like you encountered another bug, which related to the async task manager in the infrastructure.

From the logs it looks like the reboot which was done was at the point which the task was already finished and was prepared to cleaned.

Adding the task and the command:
2016-01-21 09:30:26,715 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (ajp-/ [4093f7f3] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command 'bb60b530-7060-4080-b9fa-5e06d3ab23a1'
2016-01-21 09:30:26,922 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (ajp-/ [4093f7f3] Adding task '80599970-da31-43c7-ae2d-533b651c8b21' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..

Polling the task:
 SPMAsyncTask::PollTask: Polling task '80599970-da31-43c7-ae2d-533b651c8b21' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.

Now the task has finished with success and the engine trying to remove it:
 BaseAsyncTask::onTaskEndSuccess: Task '80599970-da31-43c7-ae2d-533b651c8b21' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.

At this point the engine was restarted, just before the task has changed its status in the DB:
2016-01-21 09:30:31,463 ERROR [org.ovirt.engine.core.dal.dbbroker.DbFacade] (DefaultQuartzScheduler_Worker-64) [] Can't find dao for interface org.ovirt.engine.core.dao.CommandEntityDao

Once the engine was restarted the engine does not find the right constructor:
2016-01-21 09:32:01,648 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (org.ovirt.thread.pool-7-thread-6) [563a4d4] Can't find constructor for type org.ovirt.engine.core.bll.CreateSnapshotFromTemplateCommand with parameter types: [class org.ovirt.engine.core.common.action.CreateSnapshotFromTemplateParameters]

and that is why the task doesn't gets cleared.


engine=# SELECT * FROM async_tasks;
               task_id                | action_type | status | result |               step_id                |              command_id              |        started_at         |           st
orage_pool_id            | task_type |             vdsm_task_id             |           root_command_id            |               user_id                
 ec7b9e9e-ef02-4988-929d-b599c8d6a9ed |         208 |      2 |      0 | abd7f4ec-6ff0-4dc8-88af-8fe720a43263 | bb60b530-7060-4080-b9fa-5e06d3ab23a1 | 2016-01-21 09:30:25.47+02 | 00000001-000
1-0001-0001-0000000000a8 |         3 | 80599970-da31-43c7-ae2d-533b651c8b21 | bb60b530-7060-4080-b9fa-5e06d3ab23a1 | 00000019-0019-0019-0019-0000000001f4
(1 row)

engine=# SELECT command_id, command_type, root_command_id, status, parent_command_id, job_id FROM command_entities where command_type = 208;
              command_id              | command_type |           root_command_id            | status |          parent_command_id           |                job_id                
 bb60b530-7060-4080-b9fa-5e06d3ab23a1 |          208 | f197726c-e378-443f-acb8-af9006de3481 | ACTIVE | f197726c-e378-443f-acb8-af9006de3481 | 88696c0c-31ac-4793-b7e0-2dbc137b0ea7


[root@camel-vdsc ~]# vdsClient -s 0 getAllTasksStatuses 
{'status': {'message': 'OK', 'code': 0}, 'allTasksStatus': {'80599970-da31-43c7-ae2d-533b651c8b21': {'message': '1 jobs completed successfully', 'code': 0, 'taskID': '80599970-da31-43c7-ae2d-533b651c8b21', 'taskResult': 'success', 'taskState': 'finished'}}}

I will open a new bug which is referenced to this scenario and use this bug as dependent on it.
Comment 5 Red Hat Bugzilla Rules Engine 2016-01-21 18:05:04 EST
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 6 Maor 2016-01-25 08:46:32 EST
Basically this specific bug is a failure of failed scenario when engine gets restarted and this issue has always been there though it was never reproduced.
patch 51652 already fixed this issue though BZ1300757 currently blocks it from being verified.
Moving this to high to be correlated with BZ1300757 which it depends on.
Comment 7 Tal Nisan 2016-02-08 07:32:58 EST
Maor, I see that all patches are merged and the dependent bug is MODIFIED, should this bug moved to MODIFIED as well?
Comment 8 Tal Nisan 2016-02-08 17:49:23 EST
Maor, the patch backports go down to 3.6.2 in the 11th of January, are those the only patches or was there something missing that prevented you to move this bug to MODIFIED?
Cause if nothing else it means that this bug is already fixed in 3.6.2 and can be tested in this version instead of waiting for 3.6.5
Comment 9 Maor 2016-02-09 03:53:11 EST
The bug is about a scenario that VM is locked due to compensation issue (before any tasks were running).
The patches were already fixing it in 3.6.2, and basically it should not be failed at all.
but since there was another bug that a VM stayed in lock status because of running tasks (a different issue) this was re-opened by mistake.
so I opened a new bug and made this one defendant on it.
The verification for this bug (only compensation without running tasks) can be verified in 3.6.2 although it might be a difficult one.
Comment 10 Maor 2016-02-09 03:54:20 EST
I suggest, to make the verification more easy, to verify both of the bugs on 3.6.5
Comment 11 Yaniv Lavi 2016-02-21 08:32:34 EST
Bug 1300757 was moved to ON_QA, moving this one as well.
Comment 12 Kevin Alon Goldblatt 2016-02-23 12:23:39 EST
Verified with the following code:

Verified with the following scenario:
Steps to Reproduce:
1. Add a Template with disks
2. Create a VM from this Template
3. Restart the engine during the VM creation operation
a) When restarting the engine immediately during the disk creation, before all the disks have been created >>>>> as expected the VM is not created
b) When restarting the engine after the VM disks creation has completed but while the VM is still locked >>>> The VM is completed successfully

Moving to VERIFIED!

Note You need to log in before you can comment on or make changes to this bug.