+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1643826 +++ ====================================================================== Description of problem: Job is left in state STARTED after VMs from Pool are shutdown after pool template version is bumped. It seems that the DeleteImage task to remove the image of the previous template version for the VM is not monitored by the engine and is stuck there forever. The VDSM command is sent and finishes, but the engine does not seem to check it. Nothing is left locked and it seems everything worked well, except for these tasks that were never set to finished. In more details: 1. VM from Pool is running 2. Pool template is updated, VMs will be updated on shutdown 3. VM is shutdown 2018-10-29 15:48:24,050+10 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-7) [] EVENT_ID: VM_DOWN(61), VM Pool-1 is down. Exit message: Admin shut down from the engine 3. VM is being updated 2018-10-29 15:48:24,359+10 INFO [org.ovirt.engine.core.bll.UpdateVmVersionCommand] (EE-ManagedThreadFactory-engine-Thread-1799) [6c56bcde] Running command: UpdateVmVersionCommand internal: true. Entities affected : ID: e494cd31-87d7-4225-973d-2b09c7b6a212 Type: VM 4. VM is removed from pool 2018-10-29 15:48:24,487+10 INFO [org.ovirt.engine.core.bll.RemoveVmCommand] (EE-ManagedThreadFactory-engine-Thread-1799) [6c56bcde] Running command: RemoveVmCommand internal: true. Entities affected : ID: e494cd31-87d7-4225-973d-2b09c7b6a212 Type: VMAction group DELETE_VM with role type USER 5. VM image is deleted (old template version) 2018-10-29 15:48:24,656+10 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1799) [6c56bcde] START, DeleteImageGroupVDSCommand(...), log id: 1f5d75d6 2018-10-29 15:48:24,814+10 INFO [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (EE-ManagedThreadFactory-engine-Thread-1799) [6c56bcde] CommandMultiAsyncTasks::attachTask: Attaching task 'b9faca31-489c-42c4-a27d-96003392c02f' to command '971a27bc-4cdf-482b-9327-fb74295a849b'. [ NEVER POLLED ?? ] 6. VM image is created (new template version) 2018-10-29 15:48:25,510+10 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1799) [41755eda] START, CreateVolumeVDSCommand(....), log id: 65f8cb2e 2018-10-29 15:48:25,635+10 INFO [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (EE-ManagedThreadFactory-engine-Thread-1799) [41755eda] CommandMultiAsyncTasks::attachTask: Attaching task 'a2151cc2-bcd7-4175-938d-77db3c8d2c54' to command '76d3399d-cb94-4e9a-a848-964c43519c17'. 7. Everything finishes, VM is unlocked etc. But the DeleteImage task was never polled by the engine, and the engine think it is still running. So It looks like the problem is with the DeleteImage. Finished on VDSM # vdsm-client Host getAllTasks { "b9faca31-489c-42c4-a27d-96003392c02f": { "verb": "deleteImage", "code": 0, "state": "finished", "tag": "spm", "result": "", "message": "1 jobs completed successfully", "id": "b9faca31-489c-42c4-a27d-96003392c02f" }, task_id | status | vdsm_task_id | root_command_id --------------------------------------+--------+--------------------------------------+-------------------------------------- 15a40485-3140-4130-aa17-abac4a283ee1 | 2 | b9faca31-489c-42c4-a27d-96003392c02f | 971a27bc-4cdf-482b-9327-fb74295a849b Here we see RemoveImage from above as ACTIVE while UpdateVm already ENDED. command_id | command_type | root_command_id | status | parent_command_id --------------------------------------+--------------+--------------------------------------+--------------------+-------------------------------------- 0fd0ed4f-6d93-46eb-b4ae-4ba8ad0accf6 | 211 | 971a27bc-4cdf-482b-9327-fb74295a849b | ACTIVE | 971a27bc-4cdf-482b-9327-fb74295a849b 971a27bc-4cdf-482b-9327-fb74295a849b | 44 | 00000000-0000-0000-0000-000000000000 | ENDED_SUCCESSFULLY | 00000000-0000-0000-0000-000000000000 RemoveImage=211 UpdateVmVersion=44 Version-Release number of selected component (if applicable): ovirt-engine-4.2.6.4-1.el7.noarch How reproducible: Always Steps to Reproduce: 1. Create template, 2. Create another template as a subversion of the first 3. Create a Pool with 2 VMs, using the base version of the template 4. Start both VMs 5. Update the Pool, set the template to the latest version 6. Shutdown the VMs and wait for the Update. Actual results: 2 tasks stuck, one for each VM. Expected results: Tasks cleared after the update finishes. (Originally by Germano Veit Michel)
Verified in ovirt-engine-4.2.8.1-0.1.el7ev.noarch Verification steps (according to reproducer from comment 0): 1. Create template, 2. Create another template as a subversion of the first 3. Create a Pool with 2 VMs, using the base version of the template 4. Start both VMs 5. Update the Pool, set the template to the latest version 6. Shutdown the VMs and wait for the Update. Result: All tasks related to the VMs and to the pool have been completed successfully. No stuck tasks left.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0121