Bug 1416693 - removing several vm pools together may fail
Summary: removing several vm pools together may fail
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium vote
Target Milestone: ovirt-4.2.0
: ---
Assignee: Shmuel Melamud
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-26 09:30 UTC by sefi litmanovich
Modified: 2017-02-01 12:36 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-01 12:36:21 UTC
oVirt Team: Virt
tjelinek: ovirt-4.2?
tjelinek: blocker-
tjelinek: planning_ack?
tjelinek: devel_ack?
tjelinek: testing_ack?


Attachments (Terms of Use)
engine log, vdsm logs from both hosts, engine-backup file (3.93 MB, application/x-gzip)
2017-01-26 09:30 UTC, sefi litmanovich
no flags Details

Description sefi litmanovich 2017-01-26 09:30:39 UTC
Created attachment 1244653 [details]
engine log, vdsm logs from both hosts, engine-backup file

Description of problem:

It seems that in some scenarios where there are several vm pools in the system and we attempt to remove them at the same time, at least some of them will fail to be removed leaving some vms in the system, sometimes detached from the pool, sometimes still attached.
We hit this in our automation and then when I tried to re produce it I was able to do so 3 times with the following scenario:

Steps to Reproduce:
1. Have a pool (auto, stateless) with 5 vms, 3 of them pre started and running.
2. Have a second pool with 3 vms, not running.
3. Invoke removal of both pools async.
4. Immediately create a new pool


Actual results:
At least one of the pools (in all attempts the first pool for sure) will fail to complete remove vmpool action, leaving a vm or two detached.
In one attempt it left the remove vm pool task stuck in job table in STARTED status (attaching a DB dump of the system with this task, created with engine-backup tool).

Expected results:
Both pools are removed successfully.

I'm not sure step 4 is a must, and this might happen if we load the system with other pool related tasks, if needed I can try to create more scenarios, but this so far worked.
 

Version-Release number of selected component (if applicable):
rhevm-4.1.0.2-0.2.el7.noarch

How reproducible:
not 100% but most of the times.

Additional info:

Comment 1 Red Hat Bugzilla Rules Engine 2017-02-01 10:32:28 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Shahar Havivi 2017-02-01 12:36:21 UTC
We cannot reproduce the error on 4.1
If there is a new flow that you encounter that cause the race open a new bug with the appropriate steps to reproduce.


Note You need to log in before you can comment on or make changes to this bug.