Bug 1657764 - [downstream clone - 4.2.8] Updating template of VM Pool leaves tasks stuck after VMs shutdown
Summary: [downstream clone - 4.2.8] Updating template of VM Pool leaves tasks stuck af...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.6
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ovirt-4.2.8
: ---
Assignee: Ravi Nori
QA Contact: Pavel Novotny
URL:
Whiteboard:
Depends On: 1643826
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-10 11:59 UTC by RHV bug bot
Modified: 2021-12-10 18:31 UTC (History)
4 users (show)

Fixed In Version: ovirt-engine-4.2.8.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1643826
Environment:
Last Closed: 2019-01-22 12:44:51 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-44348 0 None None None 2021-12-10 18:31:58 UTC
Red Hat Product Errata RHBA-2019:0121 0 None None None 2019-01-22 12:44:57 UTC
oVirt gerrit 95222 0 master MERGED engine : Updating template of VM Pool leaves tasks stuck after VMs shutdown 2020-10-13 11:37:08 UTC
oVirt gerrit 95379 0 ovirt-engine-4.2 MERGED engine : Updating template of VM Pool leaves tasks stuck after VMs shutdown 2020-10-13 11:37:18 UTC

Description RHV bug bot 2018-12-10 11:59:24 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1643826 +++
======================================================================

Description of problem:

Job is left in state STARTED after VMs from Pool are shutdown after pool template version is bumped. 

It seems that the DeleteImage task to remove the image of the previous template version for the VM is not monitored by the engine and is stuck there forever. The VDSM command is sent and finishes, but the engine does not seem to check it.

Nothing is left locked and it seems everything worked well, except for these tasks that were never set to finished.

In more details:
1. VM from Pool is running
2. Pool template is updated, VMs will be updated on shutdown
3. VM is shutdown

2018-10-29 15:48:24,050+10 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-7) [] EVENT_ID: VM_DOWN(61), VM Pool-1 is down. Exit message: Admin shut down from the engine

3. VM is being updated

2018-10-29 15:48:24,359+10 INFO  [org.ovirt.engine.core.bll.UpdateVmVersionCommand] (EE-ManagedThreadFactory-engine-Thread-1799) [6c56bcde] Running command: UpdateVmVersionCommand internal: true. Entities affected :  ID: e494cd31-87d7-4225-973d-2b09c7b6a212 Type: VM

4. VM is removed from pool

2018-10-29 15:48:24,487+10 INFO  [org.ovirt.engine.core.bll.RemoveVmCommand] (EE-ManagedThreadFactory-engine-Thread-1799) [6c56bcde] Running command: RemoveVmCommand internal: true. Entities affected :  ID: e494cd31-87d7-4225-973d-2b09c7b6a212 Type: VMAction group DELETE_VM with role type USER

5. VM image is deleted (old template version)

2018-10-29 15:48:24,656+10 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1799) [6c56bcde] START, DeleteImageGroupVDSCommand(...), log id: 1f5d75d6

2018-10-29 15:48:24,814+10 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (EE-ManagedThreadFactory-engine-Thread-1799) [6c56bcde] CommandMultiAsyncTasks::attachTask: Attaching task 'b9faca31-489c-42c4-a27d-96003392c02f' to command '971a27bc-4cdf-482b-9327-fb74295a849b'.

  [ NEVER POLLED ?? ]

6. VM image is created (new template version)

2018-10-29 15:48:25,510+10 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1799) [41755eda] START, CreateVolumeVDSCommand(....), log id: 65f8cb2e

2018-10-29 15:48:25,635+10 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (EE-ManagedThreadFactory-engine-Thread-1799) [41755eda] CommandMultiAsyncTasks::attachTask: Attaching task 'a2151cc2-bcd7-4175-938d-77db3c8d2c54' to command '76d3399d-cb94-4e9a-a848-964c43519c17'.

7. Everything finishes, VM is unlocked etc. But the DeleteImage task was never polled by the engine, and the engine think it is still running.

So It looks like the problem is with the DeleteImage.

Finished on VDSM

# vdsm-client Host getAllTasks
{
    "b9faca31-489c-42c4-a27d-96003392c02f": {
        "verb": "deleteImage", 
        "code": 0, 
        "state": "finished", 
        "tag": "spm", 
        "result": "", 
        "message": "1 jobs completed successfully", 
        "id": "b9faca31-489c-42c4-a27d-96003392c02f"
    }, 

               task_id                | status |             vdsm_task_id             |           root_command_id            
--------------------------------------+--------+--------------------------------------+--------------------------------------
 15a40485-3140-4130-aa17-abac4a283ee1 |      2 | b9faca31-489c-42c4-a27d-96003392c02f | 971a27bc-4cdf-482b-9327-fb74295a849b


Here we see RemoveImage from above as ACTIVE while UpdateVm already ENDED.


              command_id              | command_type |           root_command_id            |       status       |          parent_command_id           
--------------------------------------+--------------+--------------------------------------+--------------------+--------------------------------------
 0fd0ed4f-6d93-46eb-b4ae-4ba8ad0accf6 |          211 | 971a27bc-4cdf-482b-9327-fb74295a849b | ACTIVE             | 971a27bc-4cdf-482b-9327-fb74295a849b
 971a27bc-4cdf-482b-9327-fb74295a849b |           44 | 00000000-0000-0000-0000-000000000000 | ENDED_SUCCESSFULLY | 00000000-0000-0000-0000-000000000000

    RemoveImage=211
    UpdateVmVersion=44

Version-Release number of selected component (if applicable):
ovirt-engine-4.2.6.4-1.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create template, 
2. Create another template as a subversion of the first
3. Create a Pool with 2 VMs, using the base version of the template
4. Start both VMs
5. Update the Pool, set the template to the latest version
6. Shutdown the VMs and wait for the Update.

Actual results:
2 tasks stuck, one for each VM.

Expected results:
Tasks cleared after the update finishes.

(Originally by Germano Veit Michel)

Comment 5 Pavel Novotny 2019-01-07 19:58:47 UTC
Verified in ovirt-engine-4.2.8.1-0.1.el7ev.noarch

Verification steps (according to reproducer from comment 0):
1. Create template, 
2. Create another template as a subversion of the first
3. Create a Pool with 2 VMs, using the base version of the template
4. Start both VMs
5. Update the Pool, set the template to the latest version
6. Shutdown the VMs and wait for the Update.

Result: All tasks related to the VMs and to the pool have been completed successfully. No stuck tasks left.

Comment 7 errata-xmlrpc 2019-01-22 12:44:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0121


Note You need to log in before you can comment on or make changes to this bug.