Bug 1416693

Summary: removing several vm pools together may fail
Product: [oVirt] ovirt-engine Reporter: sefi litmanovich <slitmano>
Component: BLL.VirtAssignee: Shmuel Melamud <smelamud>
Status: CLOSED WORKSFORME QA Contact: meital avital <mavital>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: bugs, shavivi, tjelinek
Target Milestone: ovirt-4.2.0Keywords: Regression
Target Release: ---Flags: tjelinek: ovirt-4.2?
tjelinek: blocker-
tjelinek: planning_ack?
tjelinek: devel_ack?
tjelinek: testing_ack?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-01 12:36:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log, vdsm logs from both hosts, engine-backup file none

Description sefi litmanovich 2017-01-26 09:30:39 UTC
Created attachment 1244653 [details]
engine log, vdsm logs from both hosts, engine-backup file

Description of problem:

It seems that in some scenarios where there are several vm pools in the system and we attempt to remove them at the same time, at least some of them will fail to be removed leaving some vms in the system, sometimes detached from the pool, sometimes still attached.
We hit this in our automation and then when I tried to re produce it I was able to do so 3 times with the following scenario:

Steps to Reproduce:
1. Have a pool (auto, stateless) with 5 vms, 3 of them pre started and running.
2. Have a second pool with 3 vms, not running.
3. Invoke removal of both pools async.
4. Immediately create a new pool


Actual results:
At least one of the pools (in all attempts the first pool for sure) will fail to complete remove vmpool action, leaving a vm or two detached.
In one attempt it left the remove vm pool task stuck in job table in STARTED status (attaching a DB dump of the system with this task, created with engine-backup tool).

Expected results:
Both pools are removed successfully.

I'm not sure step 4 is a must, and this might happen if we load the system with other pool related tasks, if needed I can try to create more scenarios, but this so far worked.
 

Version-Release number of selected component (if applicable):
rhevm-4.1.0.2-0.2.el7.noarch

How reproducible:
not 100% but most of the times.

Additional info:

Comment 1 Red Hat Bugzilla Rules Engine 2017-02-01 10:32:28 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Shahar Havivi 2017-02-01 12:36:21 UTC
We cannot reproduce the error on 4.1
If there is a new flow that you encounter that cause the race open a new bug with the appropriate steps to reproduce.