Bug 1567113 - Command entities are not removed from the DB.
Summary: Command entities are not removed from the DB.
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.1.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Ravi Nori
QA Contact: Lucie Leistnerova
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-13 12:59 UTC by Roman Hodain
Modified: 2019-05-16 13:07 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-25 14:10:51 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Roman Hodain 2018-04-13 12:59:20 UTC
Description of problem:
Many commands are not removed from the database and remain in ACTIVE status [1]. All the operations are successfully ended, but it seems that CommandCallbacksPoller is not executed. The consequence, in this case, is that VMs in a VM pool cannot be pre-started as they are in snapshot removal progress.

Version-Release number of selected component (if applicable):
RHV 4.1.10

How reproducible:
Just ones

Steps to Reproduce:
Unknown 

Actual results:
The operations are never finished.

Expected results:
The operations are finished successfully 

Additional info:

We an see that the child object is removed, but the parrent remains there and ConcurrentChildCommandsExecutionCallback is not called. As soon as the engine is restarted and the issues is narrowed down.

2018-04-09 22:17:17,554+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-7-thread-28) [2cf16ee3] BaseAsyncTask::removeTaskFromDB: Removed task '2e48cbf5-c329-4fdf-b056-c301444eed09' from DataBase
2018-04-09 22:17:17,554+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-28) [2cf16ee3] CommandAsyncTask::HandleEndActionResult [within thread]: Removing CommandMultiAsyncTasks object for entity 'df074505-3e32-4e60-a420-0f05790451b7'


[1]:
          created_at           |              command_id              |                                 command_params_class                                  | status 
-------------------------------+--------------------------------------+---------------------------------------------------------------------------------------+--------
 2018-04-09 22:16:55.144181+02 | 19c9ad42-bbb0-48a3-b676-b5047ef3aba4 | org.ovirt.engine.core.common.action.RestoreAllSnapshotsParameters                     | ACTIVE
 2018-04-09 22:21:52.167734+02 | 300f15ae-16eb-47c5-90bb-f943bb563d38 | org.ovirt.engine.core.common.action.CreateAllSnapshotsFromVmParameters                | ACTIVE
 2018-04-10 08:32:56.480064+02 | 738468d3-01c8-4944-bbfb-bbcb5702741f | org.ovirt.engine.core.common.action.AttachUserToVmFromPoolAndRunParameters            | ACTIVE
 2018-04-09 22:19:52.087792+02 | 85901b1d-38bb-4341-96ab-90833b1e208e | org.ovirt.engine.core.common.action.RestoreAllSnapshotsParameters                     | ACTIVE
 2018-04-09 22:21:51.62691+02  | 5baf78a0-2d7f-4163-b13d-16374a670216 | org.ovirt.engine.core.common.action.RunVmParams                                       | ACTIVE
 2018-04-09 22:15:38.618483+02 | bc72a993-bfbd-48c2-95eb-f7c9e9d28a76 | org.ovirt.engine.core.common.action.RestoreAllSnapshotsParameters                     | ACTIVE
 2018-04-09 22:17:57.998023+02 | 288697b8-c31f-4e47-9da1-0ac9b6e506cb | org.ovirt.engine.core.common.action.RestoreAllSnapshotsParameters                     | ACTIVE
 2018-04-09 22:19:00.653187+02 | 1e606b62-dd48-4d74-be8c-2169caa634da | org.ovirt.engine.core.common.action.RestoreAllSnapshotsParameters                     | ACTIVE
 2018-04-09 22:20:56.313792+02 | b3ebec70-6a30-41a4-a7d7-f42fb422b25f | org.ovirt.engine.core.common.action.RestoreAllSnapshotsParameters                     | ACTIVE
...

Comment 11 Ravi Nori 2018-04-27 14:21:05 UTC
With out thread dumps it is hard to pin point the issue. A single non-responsive hypervisor should not impact the stability of the system. If the issue occurs again and we get the thread dumps, I can look into the issue further.

Comment 12 Martin Perina 2018-05-02 08:54:33 UTC
Lucie, could you please try to reproduce?

Comment 13 Lucie Leistnerova 2018-05-04 12:02:48 UTC
I did not succeed in reproducing.
I tried many combination of creating pool with prestarted VMs, every time the host was slowed down (I used https://gist.github.com/obscurerichard/3740206), commands waited till the host was up again and finished successfully, even after engine was restarted.
With no-responsive host commands did not appear and VMs were not started.

Comment 14 Lucie Leistnerova 2018-05-04 12:09:31 UTC
Sorry, I forgot to mention engine version, where I tested it.
ovirt-engine-4.1.11.2-0.1.el7.noarch
and host
vdsm-4.20.9.3-1.el7ev.x86_64

Comment 15 Martin Perina 2018-05-04 12:10:59 UTC
Roman, could we close the bug with worksforme and reopen it if the bug is reproduced and thread dump provided?

Comment 16 Roman Hodain 2018-05-25 13:14:51 UTC
Sure

Comment 17 Martin Perina 2018-05-25 14:10:51 UTC
Feel free to reopen when reproduced and provide thread dump to enable further investigation

Comment 18 Franta Kust 2019-05-16 13:07:23 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.