Bug 1635337 - [Downstream Clone] Cannot assign VM from VmPool: oVirt claims it's already attached but it's not
Summary: [Downstream Clone] Cannot assign VM from VmPool: oVirt claims it's already at...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.1.11
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.3.5
: 4.3.0
Assignee: Tomasz Barański
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On: 1462236 1700389
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-02 16:43 UTC by Federico Sun
Modified: 2023-09-07 19:25 UTC (History)
14 users (show)

Fixed In Version: ovirt-engine-4.3.3.1
Doc Type: If docs needed, set a value
Doc Text:
This release ensures that virtual machines within a virtual machine pool can be attached to a user.
Clone Of: 1462236
Environment:
Last Closed: 2019-08-12 11:53:27 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine.log (222.81 KB, application/gzip)
2019-04-02 14:55 UTC, Polina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:2431 0 None None None 2019-08-12 11:53:39 UTC
oVirt gerrit 97628 0 master MERGED core: Don't lock all VMs when selecting only one 2020-12-01 14:12:39 UTC
oVirt gerrit 98844 0 ovirt-engine-4.3 MERGED core: Don't lock all VMs when selecting only one 2020-12-01 14:12:39 UTC
oVirt gerrit 99384 0 master MERGED core: Regression - prestarted VMs were kept locked 2020-12-01 14:12:37 UTC
oVirt gerrit 99385 0 ovirt-engine-4.3 MERGED core: Regression - prestarted VMs were kept locked 2020-12-01 14:12:39 UTC

Description Federico Sun 2018-10-02 16:43:48 UTC
+++ This bug was initially created as a clone of Bug #1462236 +++

Description of problem:

I created a VMPool with 40 machines and granted permissions to an user so he can attach all the 40 machines to himself. He correctly attached 37 of them, but the other 3 he can't. When he tries to, the following shows up:

PoolName:
Cannot attach VM to VM-Pool. VM-Pool is already attached to a User.
Cannot allocate and run VM from VM-Pool. There are no available VMs in the VM-Pool.

I could see the 3 machines don't have a user attached to them.

When trying to attach I can see the following in logs:

2017-06-16 13:41:19,910+01 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (org.ovirt.thread.pool-6-thread-6) [] FINISH, ConnectStoragePoolVDSCommand, log id: 6cbdcb1e
2017-06-16 13:41:22,728+01 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (org.ovirt.thread.pool-6-thread-23) [] FINISH, ConnectStoragePoolVDSCommand, log id: 2b9b2dc9
2017-06-16 13:41:30,783+01 INFO  [org.ovirt.engine.core.bll.scheduling.policyunits.EvenGuestDistributionBalancePolicyUnit] (DefaultQuartzScheduler7) [1ca398a8-c969-42ed-a230-cc648876570a] There is no host with more than 40 running guests, no balancing is needed
2017-06-16 13:41:30,888+01 WARN  [org.ovirt.engine.core.bll.scheduling.policyunits.EvenGuestDistributionBalancePolicyUnit] (DefaultQuartzScheduler7) [1ca398a8-c969-42ed-a230-cc648876570a] There is no host with less than 31 running guests
2017-06-16 13:41:30,888+01 WARN  [org.ovirt.engine.core.bll.scheduling.policyunits.CpuAndMemoryBalancingPolicyUnit] (DefaultQuartzScheduler7) [1ca398a8-c969-42ed-a230-cc648876570a] All candidate hosts have been filtered, can't balance the cluster 'Cluster-Rojo' based on the CPU usage, will try memory based approach
2017-06-16 13:41:30,929+01 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{runAsync='true', vmId='b517d80e-1fde-46fa-a3b2-53218c60be8c'}), log id: 17e10c97
2017-06-16 13:41:30,929+01 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 17e10c97
2017-06-16 13:41:31,241+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr03.domain.com' ('1d3bbd98-e26c-4c15-b87b-e8d396510287') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:31,241+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr04.domain.com' ('b2dfb945-d767-44aa-a547-2d1a4381f8e3') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:31,266+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr05.domain.com' ('f7ab19fe-5192-4a22-88cf-c8d019fa0372') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:31,387+01 INFO  [org.ovirt.engine.core.bll.VmPoolHandler] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[b517d80e-1fde-46fa-a3b2-53218c60be8c=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2017-06-16 13:41:31,494+01 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{runAsync='true', vmId='5a5ecdd8-815f-467e-8703-370be35510ce'}), log id: 779a3dd8
2017-06-16 13:41:31,494+01 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 779a3dd8
2017-06-16 13:41:31,805+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr03.domain.com' ('1d3bbd98-e26c-4c15-b87b-e8d396510287') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:31,805+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr04.domain.com' ('b2dfb945-d767-44aa-a547-2d1a4381f8e3') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:31,805+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr05.domain.com' ('f7ab19fe-5192-4a22-88cf-c8d019fa0372') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:32,001+01 INFO  [org.ovirt.engine.core.bll.VmPoolHandler] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[5a5ecdd8-815f-467e-8703-370be35510ce=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2017-06-16 13:41:32,122+01 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{runAsync='true', vmId='c0d6ec77-d947-46b5-b390-b8ce049f2247'}), log id: 61964553
2017-06-16 13:41:32,122+01 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 61964553
2017-06-16 13:41:32,444+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr03.domain.com' ('1d3bbd98-e26c-4c15-b87b-e8d396510287') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:32,444+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr04.domain.com' ('b2dfb945-d767-44aa-a547-2d1a4381f8e3') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:32,444+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr05.domain.com' ('f7ab19fe-5192-4a22-88cf-c8d019fa0372') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:32,550+01 INFO  [org.ovirt.engine.core.bll.VmPoolHandler] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[c0d6ec77-d947-46b5-b390-b8ce049f2247=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2017-06-16 13:41:32,575+01 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Lock Acquired to object 'EngineLock:{exclusiveLocks='[a749dfaf-610f-4fd7-8c2c-bbc94f99c0e7=<USER_VM_POOL, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2017-06-16 13:41:32,575+01 WARN  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Validation of action 'AttachUserToVmFromPoolAndRun' failed for user amunozba@domain. Reasons: VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VAR__ACTION__ALLOCATE_AND_RUN,VAR__TYPE__VM_FROM_VM_POOL,ACTION_TYPE_FAILED_NO_AVAILABLE_POOL_VMS,VAR__ACTION__ALLOCATE_AND_RUN,VAR__TYPE__VM_FROM_VM_POOL

I'm also providing the log from the time the machines were attached. The three indexes that oVirt claims are attached but aren't actually are: 15, 35, 36. They don't even appear in the log.

Version-Release number of selected component (if applicable):

4.1.1.8

How reproducible:

Can't say accurately because the machines were attached by a different user.

--- Additional comment from Tomas Jelinek on 2017-07-12 06:57:12 EDT ---

The strange thing is that the error message says that it checked 36 VMs and all of them are attached to a user.

It is checking 2 kinds of VMs - the ones which have UP status and the ones which have DOWN. No other statuses are taken into account.

By any chance the other VMs were in some other state like powering up? For example prestarted VMs from pool?

--- Additional comment from  on 2017-07-12 07:03:17 EDT ---

Hi Tomas,

No, 100% sure the machines were down and I took special care to make sure there are no prestarted VMs pool exist. We always create machines that are down.

Is there any DB entry that could help you debug this issue? If so don't hesitate to ask.

--- Additional comment from Tomas Jelinek on 2017-07-13 03:32:42 EDT ---

Glad you ask! :)

Here you go:

-- all vms from pool which are attached to someone
select s.vm_name, d.status, u.name from vm_static s, vm_dynamic d, permissions p, users u
where
s.vm_guid = d.vm_guid and
u.user_id = p.ad_element_id and
p.object_type_id = 2 and
p.object_id = s.vm_guid and
s.vm_name ilike '%your pool name%';

-- all vms from that pool and the status of each
select s.vm_name, d.status from vm_static s, vm_dynamic d
where
s.vm_guid = d.vm_guid and
s.vm_name ilike '%your pool name%';

(please don't forget to replace the "your pool name" string by your pool name but leave the '%' signs there)

And also it would be good if you could than try to manually run that last VMs which don't want to get attached - maybe only the logging is wrong and the problem is that they can not run due to e.g. not enough resources.

--- Additional comment from  on 2017-07-13 05:07:33 EDT ---

I found out an interesting thing right now, but I'm sharing results first:

Query 1
-------

engine=# select s.vm_name, d.status, u.name from vm_static s, vm_dynamic d, permissions p, users u
where
s.vm_guid = d.vm_guid and
u.user_id = p.ad_element_id and
p.object_type_id = 2 and
p.object_id = s.vm_guid and
s.vm_name ilike '%VMPOOL%' ORDER BY vm_name;
   vm_name    | status |     name     
--------------+--------+--------------
 VMPOOL-1   |      1 | ADRIAN
 VMPOOL-10  |      1 | ADRIAN
 VMPOOL-11  |      1 | ADRIAN
 VMPOOL-12  |      1 | ADRIAN
 VMPOOL-13  |      1 | ADRIAN
 VMPOOL-14  |      1 | ADRIAN
 VMPOOL-15  |      1 | LUIS ALBERTO
 VMPOOL-16  |      1 | ADRIAN
 VMPOOL-17  |      1 | ADRIAN
 VMPOOL-18  |      1 | ADRIAN
 VMPOOL-19  |      1 | ADRIAN
 VMPOOL-2   |      1 | ADRIAN
 VMPOOL-20  |      1 | ADRIAN
 VMPOOL-21  |      1 | ADRIAN
 VMPOOL-22  |      1 | ADRIAN
 VMPOOL-23  |      1 | ADRIAN
 VMPOOL-24  |      1 | ADRIAN
 VMPOOL-25  |      1 | ADRIAN
 VMPOOL-26  |      1 | ADRIAN
 VMPOOL-27  |      1 | ADRIAN
 VMPOOL-28  |      1 | ADRIAN
 VMPOOL-29  |      1 | ADRIAN
 VMPOOL-3   |      1 | ADRIAN
 VMPOOL-30  |      1 | ADRIAN
 VMPOOL-31  |      1 | ADRIAN
 VMPOOL-32  |      1 | ADRIAN
 VMPOOL-33  |      1 | ADRIAN
 VMPOOL-34  |      1 | ADRIAN
 VMPOOL-37  |      1 | ADRIAN
 VMPOOL-38  |      1 | ADRIAN
 VMPOOL-39  |      1 | ADRIAN
 VMPOOL-4   |      1 | ADRIAN
 VMPOOL-40  |      1 | ADRIAN
 VMPOOL-41  |      1 | LUIS ALBERTO
 VMPOOL-43  |      1 | LUIS ALBERTO
 VMPOOL-5   |      1 | ADRIAN
 VMPOOL-6   |      1 | ADRIAN
 VMPOOL-7   |      1 | ADRIAN
 VMPOOL-8   |      1 | ADRIAN
 VMPOOL-9   |      1 | ADRIAN
(40 rows)

As you can see, VMPOOL-42 is not present (which is the VM with the issue).

Query 2
-------

engine=# select s.vm_name, d.status from vm_static s, vm_dynamic d
engine-# where
engine-# s.vm_guid = d.vm_guid and
engine-# s.vm_name ilike '%VMPOOL%' ORDER BY vm_name;
   vm_name    | status 
--------------+--------
 VMPOOL-1   |      1
 VMPOOL-10  |      1
 VMPOOL-11  |      1
 VMPOOL-12  |      1
 VMPOOL-13  |      1
 VMPOOL-14  |      1
 VMPOOL-15  |      1
 VMPOOL-16  |      1
 VMPOOL-17  |      1
 VMPOOL-18  |      1
 VMPOOL-19  |      1
 VMPOOL-2   |      1
 VMPOOL-20  |      1
 VMPOOL-21  |      1
 VMPOOL-22  |      1
 VMPOOL-23  |      1
 VMPOOL-24  |      1
 VMPOOL-25  |      1
 VMPOOL-26  |      1
 VMPOOL-27  |      1
 VMPOOL-28  |      1
 VMPOOL-29  |      1
 VMPOOL-3   |      1
 VMPOOL-30  |      1
 VMPOOL-31  |      1
 VMPOOL-32  |      1
 VMPOOL-33  |      1
 VMPOOL-34  |      1
 VMPOOL-37  |      1
 VMPOOL-38  |      1
 VMPOOL-39  |      1
 VMPOOL-4   |      1
 VMPOOL-40  |      1
 VMPOOL-41  |      1
 VMPOOL-42  |      0
 VMPOOL-43  |      1
 VMPOOL-5   |      1
 VMPOOL-6   |      1
 VMPOOL-7   |      1
 VMPOOL-8   |      1
 VMPOOL-9   |      1
(41 rows)

Now the interesting thing: When I reported the issue I logged in as the user having the problem and tested it myself so I could send the data on this BZ. I just made another test now: I granted myself permissions (with my user) on the pool and tried to attach the only left VM to myself and it worked.

So from that I have 2 suspicions on where the base of this issue could be:

1) Could it be some kind of issue with the calculation of max. VMs per user on the pool? The pool was initially created with 40 VMs and originally only one user tried to attach them all, resulting in the issue above. Then we extended the pool with 3 machines, and he could only attach 2 more, etc.

2) As 40 is a relatively big number of VMs, I suspect the user didn't attach the VMs in an orderly way, i.e., pushing the "start" icon and waiting until the VM was attached, and repeat this operation 39 times, but pushed the "start" icon 40 times in a row instead, waited to see how many of them were attached, and pressed the "start" icon again to try to attach the rest of them. Could this be an issue in the way that a "start" press on a VMPool invokes an action in the backend and the "start" press burst could have executed some actions that were not completed?

If you need any additional tests don't hesitate to ask.

--- Additional comment from Tomas Jelinek on 2017-07-14 04:45:33 EDT ---

> Now the interesting thing: When I reported the issue I logged in as the user
> having the problem and tested it myself so I could send the data on this BZ.
> I just made another test now: I granted myself permissions (with my user) on
> the pool and tried to attach the only left VM to myself and it worked.

good, so it is enough to wait until it is unlocked -> see my following comments

> 
> So from that I have 2 suspicions on where the base of this issue could be:
> 
> 1) Could it be some kind of issue with the calculation of max. VMs per user

no, it would fail with a different error

> on the pool? The pool was initially created with 40 VMs and originally only
> one user tried to attach them all, resulting in the issue above. Then we
> extended the pool with 3 machines, and he could only attach 2 more, etc.
> 
> 2) As 40 is a relatively big number of VMs, I suspect the user didn't attach
> the VMs in an orderly way, i.e., pushing the "start" icon and waiting until
> the VM was attached, and repeat this operation 39 times, but pushed the
> "start" icon 40 times in a row instead, waited to see how many of them were
> attached, and pressed the "start" icon again to try to attach the rest of
> them. Could this be an issue in the way that a "start" press on a VMPool
> invokes an action in the backend and the "start" press burst could have
> executed some actions that were not completed?


yes, seems you are right. In the process of assigning the VM from pool to user we are locking the VMs (not only the one we assign but also the candidate ones). This are than unlocked so the next run can again take them, but if the timing is wrong, the command will have no free and not locked VMs so will fail with the error that there are no more free VMs.

Also the log seems to support this, there are this messages:
"Failed to Acquire Lock to object"

It could be fixed by making sure to lock only the one VM we are actually going to assign.

Code pointer for the future assignee:
VmPoolHandler.selectVms() -> ".map(vmId -> acquireVm(vmId, leaveLocked))." locks all candidate vms and only later the first is taken.

> 
> If you need any additional tests don't hesitate to ask.

thank you for all the quick feedback!

Comment 3 Ryan Barry 2019-01-21 14:54:12 UTC
Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both

Comment 6 Polina 2019-04-02 14:55:28 UTC
Created attachment 1551051 [details]
engine.log

Looks like the fix caused regression found in automation run in ovirt-engine-4.3.3.1-0.1.el7.noarch (100% reproducible ):

Steps for reproduce:
1. Create pool with prestarted vm (I created pool with two vms , both prestarted)
2. try to allocate  with rest API :

POST https://{{host}}/ovirt-engine/api/vmpools/9e605316-cddf-4bd1-9f0e-e8f2f10aed1d/allocatevm

body: 

<action>
    <async>false</async>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
</action>

brings  error  [Cannot allocate and run VM from VM-Pool. Related operation is currently in progress. Please try again later.].
Also such pool could be never removed from the engine with the same error.


engine log attached

Comment 7 Polina 2019-04-02 15:15:38 UTC
remove such pool I succeeded only after ovirt-engine service restart

Comment 8 Ryan Barry 2019-04-03 11:52:21 UTC
(In reply to Polina from comment #6)
> Created attachment 1551051 [details]
> engine.log
> 
> Looks like the fix caused regression found in automation run in
> ovirt-engine-4.3.3.1-0.1.el7.noarch (100% reproducible ):
> 
> Steps for reproduce:
> 1. Create pool with prestarted vm (I created pool with two vms , both
> prestarted)
> 2. try to allocate  with rest API :
> 
> POST
> https://{{host}}/ovirt-engine/api/vmpools/9e605316-cddf-4bd1-9f0e-
> e8f2f10aed1d/allocatevm
> 
> body: 
> 
> <action>
>     <async>false</async>
>     <grace_period>
>         <expiry>10</expiry>
>     </grace_period>
> </action>
> 
> brings  error  [Cannot allocate and run VM from VM-Pool. Related operation
> is currently in progress. Please try again later.].
> Also such pool could be never removed from the engine with the same error.
> 
> 
> engine log attached

Back to ASSIGNED.

Tomas, are we forgetting to release the locks?

Comment 9 Tomasz Barański 2019-04-03 11:54:49 UTC
I'll look into it.

Comment 10 Tomasz Barański 2019-04-05 14:15:39 UTC
I tried to reproduce this on a build that contains the patch, but I couldn't. The response:

<action>
    <async>false</async>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
    <status>complete</status>
    <vm href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67" id="6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67">
        <actions>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/detach" rel="detach"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/export" rel="export"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/ticket" rel="ticket"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/migrate" rel="migrate"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/cancelmigration" rel="cancelmigration"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/commitsnapshot" rel="commitsnapshot"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/clone" rel="clone"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/freezefilesystems" rel="freezefilesystems"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/logon" rel="logon"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/maintenance" rel="maintenance"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/previewsnapshot" rel="previewsnapshot"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/reordermacaddresses" rel="reordermacaddresses"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/thawfilesystems" rel="thawfilesystems"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/undosnapshot" rel="undosnapshot"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/reboot" rel="reboot"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/shutdown" rel="shutdown"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/start" rel="start"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/stop" rel="stop"/>
            <link href="/ovirt-engine/api/vms/6c9b1a2a-89c3-41e7-97c5-fcc6c6d30f67/suspend" rel="suspend"/>
        </actions>
    </vm>
</action>



I'm going to try it with a fresh build from master.

Comment 11 Tomasz Barański 2019-04-08 13:44:22 UTC
The bug seems to be more tricky.

I've installed a fresh engine from the master. I've created a template and a pool of 2 VMs, both prestarted.
Then.

1. I run the allocatevm action from the API, got the following error:
[Cannot allocate and run VM from VM-Pool. There are no available VMs in the VM-Pool.]
2. In the GUI the pool shows 2 running VMs.
3. Listing the pool form the API shows only 1 VM
4. Running the allocatevm action again gives the following error:
[Cannot allocate and run VM from VM-Pool. Related operation is currently in progress. Please try again later.]

Investigating...

Comment 15 Nisim Simsolo 2019-05-06 11:52:02 UTC
Verified:
rhvm-4.3.3.6-0.1.el7
vdsm-4.30.13-1.el7ev.x86_64
libvirt-4.5.0-10.el7_6.7.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.4.x86_64

Verification scenario: 
1. Create pool of 40 VMs from RHEL7.6 with qemu_ga template. set prestarted VMs to 0 and maximum number of VMs per user to 40.
2. From VM portal, using user with UserRole permission, run all VMs. 
3. Verify all VMs are running and attached to the user. for example: 
engine=#  select s.vm_name, d.status from vm_static s, vm_dynamic d where s.vm_guid = d.vm_guid and s.vm_name ilike '%vm-pool%' ORDER BY vm_name;
  vm_name   | status 
------------+--------
 vm-pool-1  |      1
 vm-pool-10 |      1
 vm-pool-11 |      1
 vm-pool-12 |      1
 vm-pool-13 |      1
 vm-pool-14 |      1
 vm-pool-15 |      1
 vm-pool-16 |      1
 vm-pool-17 |      1
 vm-pool-18 |      1
 vm-pool-19 |      1
 vm-pool-2  |      1
 vm-pool-20 |      1
 vm-pool-21 |      1
 vm-pool-22 |      1
 vm-pool-23 |      1
 vm-pool-24 |      1
 vm-pool-25 |      1
 vm-pool-26 |      1
 vm-pool-27 |      1
 vm-pool-28 |      1
 vm-pool-29 |      1
 vm-pool-3  |      1
 vm-pool-30 |      1
 vm-pool-31 |      1
 vm-pool-32 |      1
 vm-pool-33 |      1
 vm-pool-34 |      1
 vm-pool-35 |      1
 vm-pool-36 |      1
 vm-pool-37 |      1
 vm-pool-38 |      1
 vm-pool-39 |      1
 vm-pool-4  |      1
 vm-pool-40 |      1
 vm-pool-5  |      1
 vm-pool-6  |      1
 vm-pool-7  |      1
 vm-pool-8  |      1
 vm-pool-9  |      1
(40 rows)

engine=# select s.vm_name, d.status, u.name from vm_static s, vm_dynamic d, permissions p, users u where s.vm_guid = d.vm_guid and u.user_id = p.ad_element_id and p.object_type_id = 2 and p.object_id = s.vm_guid and s.vm_name ilike '%vm-pool%' ORDER BY vm_name;
  vm_name   | status | name  
------------+--------+-------
 vm-pool-1  |      1 | user1
 vm-pool-10 |      1 | user1
 vm-pool-11 |      1 | user1
 vm-pool-12 |      1 | user1
 vm-pool-13 |      1 | user1
 vm-pool-14 |      1 | user1
 vm-pool-15 |      1 | user1
 vm-pool-16 |      1 | user1
 vm-pool-17 |      1 | user1
 vm-pool-18 |      1 | user1
 vm-pool-19 |      1 | user1
 vm-pool-2  |      1 | user1
 vm-pool-20 |      1 | user1
 vm-pool-21 |      1 | user1
 vm-pool-22 |      1 | user1
 vm-pool-23 |      1 | user1
 vm-pool-24 |      1 | user1
 vm-pool-25 |      1 | user1
 vm-pool-26 |      1 | user1
 vm-pool-27 |      1 | user1
 vm-pool-28 |      1 | user1
 vm-pool-29 |      1 | user1
 vm-pool-3  |      1 | user1
 vm-pool-30 |      1 | user1
 vm-pool-31 |      1 | user1
 vm-pool-32 |      1 | user1
 vm-pool-33 |      1 | user1
 vm-pool-34 |      1 | user1
 vm-pool-35 |      1 | user1
 vm-pool-36 |      1 | user1
 vm-pool-37 |      1 | user1
 vm-pool-38 |      1 | user1
 vm-pool-39 |      1 | user1
 vm-pool-4  |      1 | user1
 vm-pool-40 |      1 | user1
 vm-pool-5  |      1 | user1
 vm-pool-6  |      1 | user1
 vm-pool-7  |      1 | user1
 vm-pool-8  |      1 | user1
 vm-pool-9  |      1 | user1
(40 rows)

engine=# 

4. Repeat step 1-3, this time create pool with 20 prestarted VMs, and attach all of the VMs to the same user (prestarted and not prestarted).
5. Repeat steps 1-3, this time create pool with 40 prestarted VMs.

Comment 16 RHV bug bot 2019-05-16 15:29:07 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 17 RHV bug bot 2019-05-30 11:13:46 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 18 RHV bug bot 2019-06-06 08:38:52 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 19 Sandro Bonazzola 2019-06-26 14:16:30 UTC
re-targeting to 4.3.5 since it missed 4.3.4 errata but it was already fixed in 4.3.3.

Comment 21 errata-xmlrpc 2019-08-12 11:53:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2431

Comment 22 Daniel Gur 2019-08-28 13:15:03 UTC
sync2jira

Comment 23 Daniel Gur 2019-08-28 13:20:06 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.