Bug 1462236 - Cannot assign VM from VmPool: oVirt claims it's already attached but it's not
Summary: Cannot assign VM from VmPool: oVirt claims it's already attached but it's not
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.1.1.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Tomasz Barański
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks: 1635337
TreeView+ depends on / blocked
 
Reported: 2017-06-16 12:58 UTC by nicolas
Modified: 2022-06-14 00:46 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
: 1635337 (view as bug list)
Environment:
Last Closed: 2020-04-01 14:46:22 UTC
oVirt Team: Virt
Embargoed:
rbarry: ovirt-4.5?


Attachments (Terms of Use)
engine.log (268.49 KB, application/x-gzip)
2017-06-16 12:58 UTC, nicolas
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-46408 0 None None None 2022-06-14 00:46:20 UTC

Description nicolas 2017-06-16 12:58:10 UTC
Created attachment 1288359 [details]
engine.log

Description of problem:

I created a VMPool with 40 machines and granted permissions to an user so he can attach all the 40 machines to himself. He correctly attached 37 of them, but the other 3 he can't. When he tries to, the following shows up:

PoolName:
Cannot attach VM to VM-Pool. VM-Pool is already attached to a User.
Cannot allocate and run VM from VM-Pool. There are no available VMs in the VM-Pool.

I could see the 3 machines don't have a user attached to them.

When trying to attach I can see the following in logs:

2017-06-16 13:41:19,910+01 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (org.ovirt.thread.pool-6-thread-6) [] FINISH, ConnectStoragePoolVDSCommand, log id: 6cbdcb1e
2017-06-16 13:41:22,728+01 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (org.ovirt.thread.pool-6-thread-23) [] FINISH, ConnectStoragePoolVDSCommand, log id: 2b9b2dc9
2017-06-16 13:41:30,783+01 INFO  [org.ovirt.engine.core.bll.scheduling.policyunits.EvenGuestDistributionBalancePolicyUnit] (DefaultQuartzScheduler7) [1ca398a8-c969-42ed-a230-cc648876570a] There is no host with more than 40 running guests, no balancing is needed
2017-06-16 13:41:30,888+01 WARN  [org.ovirt.engine.core.bll.scheduling.policyunits.EvenGuestDistributionBalancePolicyUnit] (DefaultQuartzScheduler7) [1ca398a8-c969-42ed-a230-cc648876570a] There is no host with less than 31 running guests
2017-06-16 13:41:30,888+01 WARN  [org.ovirt.engine.core.bll.scheduling.policyunits.CpuAndMemoryBalancingPolicyUnit] (DefaultQuartzScheduler7) [1ca398a8-c969-42ed-a230-cc648876570a] All candidate hosts have been filtered, can't balance the cluster 'Cluster-Rojo' based on the CPU usage, will try memory based approach
2017-06-16 13:41:30,929+01 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{runAsync='true', vmId='b517d80e-1fde-46fa-a3b2-53218c60be8c'}), log id: 17e10c97
2017-06-16 13:41:30,929+01 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 17e10c97
2017-06-16 13:41:31,241+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr03.domain.com' ('1d3bbd98-e26c-4c15-b87b-e8d396510287') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:31,241+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr04.domain.com' ('b2dfb945-d767-44aa-a547-2d1a4381f8e3') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:31,266+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr05.domain.com' ('f7ab19fe-5192-4a22-88cf-c8d019fa0372') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:31,387+01 INFO  [org.ovirt.engine.core.bll.VmPoolHandler] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[b517d80e-1fde-46fa-a3b2-53218c60be8c=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2017-06-16 13:41:31,494+01 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{runAsync='true', vmId='5a5ecdd8-815f-467e-8703-370be35510ce'}), log id: 779a3dd8
2017-06-16 13:41:31,494+01 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 779a3dd8
2017-06-16 13:41:31,805+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr03.domain.com' ('1d3bbd98-e26c-4c15-b87b-e8d396510287') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:31,805+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr04.domain.com' ('b2dfb945-d767-44aa-a547-2d1a4381f8e3') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:31,805+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr05.domain.com' ('f7ab19fe-5192-4a22-88cf-c8d019fa0372') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:32,001+01 INFO  [org.ovirt.engine.core.bll.VmPoolHandler] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[5a5ecdd8-815f-467e-8703-370be35510ce=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2017-06-16 13:41:32,122+01 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{runAsync='true', vmId='c0d6ec77-d947-46b5-b390-b8ce049f2247'}), log id: 61964553
2017-06-16 13:41:32,122+01 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 61964553
2017-06-16 13:41:32,444+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr03.domain.com' ('1d3bbd98-e26c-4c15-b87b-e8d396510287') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:32,444+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr04.domain.com' ('b2dfb945-d767-44aa-a547-2d1a4381f8e3') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:32,444+01 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Candidate host 'kvmr05.domain.com' ('f7ab19fe-5192-4a22-88cf-c8d019fa0372') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2017-06-16 13:41:32,550+01 INFO  [org.ovirt.engine.core.bll.VmPoolHandler] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[c0d6ec77-d947-46b5-b390-b8ce049f2247=<VM, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2017-06-16 13:41:32,575+01 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Lock Acquired to object 'EngineLock:{exclusiveLocks='[a749dfaf-610f-4fd7-8c2c-bbc94f99c0e7=<USER_VM_POOL, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2017-06-16 13:41:32,575+01 WARN  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-17) [988eab2e-f8f5-48e1-b2a8-09872613b24d] Validation of action 'AttachUserToVmFromPoolAndRun' failed for user amunozba@domain. Reasons: VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VM_POOL_CANNOT_ADD_VM_WITH_USERS_ATTACHED_TO_POOL,VAR__ACTION__ALLOCATE_AND_RUN,VAR__TYPE__VM_FROM_VM_POOL,ACTION_TYPE_FAILED_NO_AVAILABLE_POOL_VMS,VAR__ACTION__ALLOCATE_AND_RUN,VAR__TYPE__VM_FROM_VM_POOL

I'm also providing the log from the time the machines were attached. The three indexes that oVirt claims are attached but aren't actually are: 15, 35, 36. They don't even appear in the log.

Version-Release number of selected component (if applicable):

4.1.1.8

How reproducible:

Can't say accurately because the machines were attached by a different user.

Comment 1 Tomas Jelinek 2017-07-12 10:57:12 UTC
The strange thing is that the error message says that it checked 36 VMs and all of them are attached to a user.

It is checking 2 kinds of VMs - the ones which have UP status and the ones which have DOWN. No other statuses are taken into account.

By any chance the other VMs were in some other state like powering up? For example prestarted VMs from pool?

Comment 2 nicolas 2017-07-12 11:03:17 UTC
Hi Tomas,

No, 100% sure the machines were down and I took special care to make sure there are no prestarted VMs pool exist. We always create machines that are down.

Is there any DB entry that could help you debug this issue? If so don't hesitate to ask.

Comment 3 Tomas Jelinek 2017-07-13 07:32:42 UTC
Glad you ask! :)

Here you go:

-- all vms from pool which are attached to someone
select s.vm_name, d.status, u.name from vm_static s, vm_dynamic d, permissions p, users u
where
s.vm_guid = d.vm_guid and
u.user_id = p.ad_element_id and
p.object_type_id = 2 and
p.object_id = s.vm_guid and
s.vm_name ilike '%your pool name%';

-- all vms from that pool and the status of each
select s.vm_name, d.status from vm_static s, vm_dynamic d
where
s.vm_guid = d.vm_guid and
s.vm_name ilike '%your pool name%';

(please don't forget to replace the "your pool name" string by your pool name but leave the '%' signs there)

And also it would be good if you could than try to manually run that last VMs which don't want to get attached - maybe only the logging is wrong and the problem is that they can not run due to e.g. not enough resources.

Comment 4 nicolas 2017-07-13 09:07:33 UTC
I found out an interesting thing right now, but I'm sharing results first:

Query 1
-------

engine=# select s.vm_name, d.status, u.name from vm_static s, vm_dynamic d, permissions p, users u
where
s.vm_guid = d.vm_guid and
u.user_id = p.ad_element_id and
p.object_type_id = 2 and
p.object_id = s.vm_guid and
s.vm_name ilike '%VMPOOL%' ORDER BY vm_name;
   vm_name    | status |     name     
--------------+--------+--------------
 VMPOOL-1   |      1 | ADRIAN
 VMPOOL-10  |      1 | ADRIAN
 VMPOOL-11  |      1 | ADRIAN
 VMPOOL-12  |      1 | ADRIAN
 VMPOOL-13  |      1 | ADRIAN
 VMPOOL-14  |      1 | ADRIAN
 VMPOOL-15  |      1 | LUIS ALBERTO
 VMPOOL-16  |      1 | ADRIAN
 VMPOOL-17  |      1 | ADRIAN
 VMPOOL-18  |      1 | ADRIAN
 VMPOOL-19  |      1 | ADRIAN
 VMPOOL-2   |      1 | ADRIAN
 VMPOOL-20  |      1 | ADRIAN
 VMPOOL-21  |      1 | ADRIAN
 VMPOOL-22  |      1 | ADRIAN
 VMPOOL-23  |      1 | ADRIAN
 VMPOOL-24  |      1 | ADRIAN
 VMPOOL-25  |      1 | ADRIAN
 VMPOOL-26  |      1 | ADRIAN
 VMPOOL-27  |      1 | ADRIAN
 VMPOOL-28  |      1 | ADRIAN
 VMPOOL-29  |      1 | ADRIAN
 VMPOOL-3   |      1 | ADRIAN
 VMPOOL-30  |      1 | ADRIAN
 VMPOOL-31  |      1 | ADRIAN
 VMPOOL-32  |      1 | ADRIAN
 VMPOOL-33  |      1 | ADRIAN
 VMPOOL-34  |      1 | ADRIAN
 VMPOOL-37  |      1 | ADRIAN
 VMPOOL-38  |      1 | ADRIAN
 VMPOOL-39  |      1 | ADRIAN
 VMPOOL-4   |      1 | ADRIAN
 VMPOOL-40  |      1 | ADRIAN
 VMPOOL-41  |      1 | LUIS ALBERTO
 VMPOOL-43  |      1 | LUIS ALBERTO
 VMPOOL-5   |      1 | ADRIAN
 VMPOOL-6   |      1 | ADRIAN
 VMPOOL-7   |      1 | ADRIAN
 VMPOOL-8   |      1 | ADRIAN
 VMPOOL-9   |      1 | ADRIAN
(40 rows)

As you can see, VMPOOL-42 is not present (which is the VM with the issue).

Query 2
-------

engine=# select s.vm_name, d.status from vm_static s, vm_dynamic d
engine-# where
engine-# s.vm_guid = d.vm_guid and
engine-# s.vm_name ilike '%VMPOOL%' ORDER BY vm_name;
   vm_name    | status 
--------------+--------
 VMPOOL-1   |      1
 VMPOOL-10  |      1
 VMPOOL-11  |      1
 VMPOOL-12  |      1
 VMPOOL-13  |      1
 VMPOOL-14  |      1
 VMPOOL-15  |      1
 VMPOOL-16  |      1
 VMPOOL-17  |      1
 VMPOOL-18  |      1
 VMPOOL-19  |      1
 VMPOOL-2   |      1
 VMPOOL-20  |      1
 VMPOOL-21  |      1
 VMPOOL-22  |      1
 VMPOOL-23  |      1
 VMPOOL-24  |      1
 VMPOOL-25  |      1
 VMPOOL-26  |      1
 VMPOOL-27  |      1
 VMPOOL-28  |      1
 VMPOOL-29  |      1
 VMPOOL-3   |      1
 VMPOOL-30  |      1
 VMPOOL-31  |      1
 VMPOOL-32  |      1
 VMPOOL-33  |      1
 VMPOOL-34  |      1
 VMPOOL-37  |      1
 VMPOOL-38  |      1
 VMPOOL-39  |      1
 VMPOOL-4   |      1
 VMPOOL-40  |      1
 VMPOOL-41  |      1
 VMPOOL-42  |      0
 VMPOOL-43  |      1
 VMPOOL-5   |      1
 VMPOOL-6   |      1
 VMPOOL-7   |      1
 VMPOOL-8   |      1
 VMPOOL-9   |      1
(41 rows)

Now the interesting thing: When I reported the issue I logged in as the user having the problem and tested it myself so I could send the data on this BZ. I just made another test now: I granted myself permissions (with my user) on the pool and tried to attach the only left VM to myself and it worked.

So from that I have 2 suspicions on where the base of this issue could be:

1) Could it be some kind of issue with the calculation of max. VMs per user on the pool? The pool was initially created with 40 VMs and originally only one user tried to attach them all, resulting in the issue above. Then we extended the pool with 3 machines, and he could only attach 2 more, etc.

2) As 40 is a relatively big number of VMs, I suspect the user didn't attach the VMs in an orderly way, i.e., pushing the "start" icon and waiting until the VM was attached, and repeat this operation 39 times, but pushed the "start" icon 40 times in a row instead, waited to see how many of them were attached, and pressed the "start" icon again to try to attach the rest of them. Could this be an issue in the way that a "start" press on a VMPool invokes an action in the backend and the "start" press burst could have executed some actions that were not completed?

If you need any additional tests don't hesitate to ask.

Comment 5 Tomas Jelinek 2017-07-14 08:45:33 UTC
> Now the interesting thing: When I reported the issue I logged in as the user
> having the problem and tested it myself so I could send the data on this BZ.
> I just made another test now: I granted myself permissions (with my user) on
> the pool and tried to attach the only left VM to myself and it worked.

good, so it is enough to wait until it is unlocked -> see my following comments

> 
> So from that I have 2 suspicions on where the base of this issue could be:
> 
> 1) Could it be some kind of issue with the calculation of max. VMs per user

no, it would fail with a different error

> on the pool? The pool was initially created with 40 VMs and originally only
> one user tried to attach them all, resulting in the issue above. Then we
> extended the pool with 3 machines, and he could only attach 2 more, etc.
> 
> 2) As 40 is a relatively big number of VMs, I suspect the user didn't attach
> the VMs in an orderly way, i.e., pushing the "start" icon and waiting until
> the VM was attached, and repeat this operation 39 times, but pushed the
> "start" icon 40 times in a row instead, waited to see how many of them were
> attached, and pressed the "start" icon again to try to attach the rest of
> them. Could this be an issue in the way that a "start" press on a VMPool
> invokes an action in the backend and the "start" press burst could have
> executed some actions that were not completed?


yes, seems you are right. In the process of assigning the VM from pool to user we are locking the VMs (not only the one we assign but also the candidate ones). This are than unlocked so the next run can again take them, but if the timing is wrong, the command will have no free and not locked VMs so will fail with the error that there are no more free VMs.

Also the log seems to support this, there are this messages:
"Failed to Acquire Lock to object"

It could be fixed by making sure to lock only the one VM we are actually going to assign.

Code pointer for the future assignee:
VmPoolHandler.selectVms() -> ".map(vmId -> acquireVm(vmId, leaveLocked))." locks all candidate vms and only later the first is taken.

> 
> If you need any additional tests don't hesitate to ask.

thank you for all the quick feedback!

Comment 6 Ryan Barry 2019-01-21 14:53:28 UTC
Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both

Comment 7 Rik Theys 2019-05-13 06:55:45 UTC
Hi,

We are still experiencing this issue on ovirt-engine-4.3.3.7-1.el7.noarch. Is there any update on when a fix for this issue is available? It there another workaround available other than restarting ovirt-engine?

Regards,
Rik

Comment 8 Ryan Barry 2019-05-13 12:32:27 UTC
No updates yet

Comment 9 Rik Theys 2019-05-14 11:50:41 UTC
Hi,

This seems to have gotten a lot worse. Now only a single VM can be started from a pool. Afterwards no VM's can be started from that pool until ovirt-engine is restarted (and then only a single instance can be started).

During the initial launch it logs:

2019-05-14 13:26:46,058+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='d8a99676-d520-425e-9974-1b1efe6da8a5'}), log id: 2fb4f7f5
2019-05-14 13:26:46,058+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 2fb4f7f5
2019-05-14 13:26:46,208+02 INFO  [org.ovirt.engine.core.bll.VmPoolHandler] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Lock Acquired to object 'EngineLock:{exclusiveLocks='[d8a99676-d520-425e-9974-1b1efe6da8a5=VM]', sharedLocks=''}'

-> it has acquired a lock (lock1)

2019-05-14 13:26:46,247+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Lock Acquired to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'

-> it has acquired another lock (lock2)

2019-05-14 13:26:46,352+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Running command: AttachUserToVmFromPoolAndRunCommand internal: false. Entities affected :  ID: 4c622213-e5f4-4032-8639-643174b698cc Type: VmPoolAction group VM_POOL_BASIC_OPERATIONS with role type USER
2019-05-14 13:26:46,393+02 INFO  [org.ovirt.engine.core.bll.AddPermissionCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Running command: AddPermissionCommand internal: true. Entities affected :  ID: d8a99676-d520-425e-9974-1b1efe6da8a5 Type: VMAction group MANIPULATE_PERMISSIONS with role type USER
2019-05-14 13:26:46,433+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Succeeded giving user 'a5bed59c-d2fe-4fe4-bff7-52efe089ebd6' permission to Vm 'd8a99676-d520-425e-9974-1b1efe6da8a5'
2019-05-14 13:26:46,608+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='d8a99676-d520-425e-9974-1b1efe6da8a5'}), log id: 67acc561
2019-05-14 13:26:46,608+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 67acc561
2019-05-14 13:26:46,719+02 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Running command:RunVmCommand internal: true. Entities affected :  ID: d8a99676-d520-425e-9974-1b1efe6da8a5 Type: VMAction group RUN_VM with role type USER
2019-05-14 13:26:46,791+02 INFO  [org.ovirt.engine.core.vdsbroker.UpdateVmDynamicDataVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START, UpdateVmDynamicDataVDSCommand( UpdateVmDynamicDataVDSCommandParameters:{hostId='null', vmId='d8a99676-d520-425e-9974-1b1efe6da8a5', vmDynamic='org.ovirt.engine.core.common.businessentities.VmDynamic@6db8c94d'}), log id: 2c110e4
2019-05-14 13:26:46,795+02 INFO  [org.ovirt.engine.core.vdsbroker.UpdateVmDynamicDataVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, UpdateVmDynamicDataVDSCommand, return: , log id: 2c110e4
2019-05-14 13:26:46,804+02 INFO  [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START,CreateVDSCommand( CreateVDSCommandParameters:{hostId='eec7ec2b-cae1-4bb9-b933-4dff47a70bdb', vmId='d8a99676-d520-425e-9974-1b1efe6da8a5', vm='VM [stud-c7-1]'}), log id: 71d599f2
2019-05-14 13:26:46,809+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START, CreateBrokerVDSCommand(HostName = studvirt1, CreateVDSCommandParameters:{hostId='eec7ec2b-cae1-4bb9-b933-4dff47a70bdb', vmId='d8a99676-d520-425e-9974-1b1efe6da8a5', vm='VM [stud-c7-1]'}), log id: 3aa6b5ff
2019-05-14 13:26:46,836+02 INFO  [org.ovirt.engine.core.vdsbroker.builder.vminfo.VmInfoBuildUtils] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Kernel FIPS - Guid: eec7ec2b-cae1-4bb9-b933-4dff47a70bdb fips: false
2019-05-14 13:26:46,903+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] VM <?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
[domain xml stripped]
2019-05-14 13:26:46,928+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, CreateBrokerVDSCommand, return: , log id: 3aa6b5ff
2019-05-14 13:26:46,932+02 INFO  [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, CreateVDSCommand, return: WaitForLaunch, log id: 71d599f2
2019-05-14 13:26:46,932+02 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Lock freed to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'

-> it has released lock2

2019-05-14 13:26:47,004+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM stud-c7-1 on Host studvirt1
2019-05-14 13:26:47,094+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] EVENT_ID: USER_ATTACH_USER_TO_VM_FROM_POOL(316), Attaching User u0045469 to VM stud-c7-1 in VM Pool stud-c7-? was initiated by u0045469.be-authz.
2019-05-14 13:26:47,098+02 WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Trying to release exclusive lock which does not exist, lock key: 'a5bed59c-d2fe-4fe4-bff7-52efe089ebd6USER_VM_POOL'

-> it's trying to release the same lock2 as above and failing

2019-05-14 13:26:47,098+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Lock freed to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'

-> now it's indicating that it released/freed the lock (lock2)

2019-05-14 13:26:48,518+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-3) [e3c5745c-e593-4aed-ba67-b173808140e8] Command 'AttachUserToVmFromPoolAndRun' id: '0148c91d-b053-4dc9-960c-f10bf0d3f343' child commands '[0470802d-09fa-4579-b95b-3d9603feff7b, 47dbfc58-3bae-4229-96eb-d1fc08911237]' executions were completed, status 'SUCCEEDED'
2019-05-14 13:26:49,584+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [e3c5745c-e593-4aed-ba67-b173808140e8] Ending command 'org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand' successfully.
2019-05-14 13:26:49,650+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [e3c5745c-e593-4aed-ba67-b173808140e8] EVENT_ID: USER_ATTACH_USER_TO_VM_FROM_POOL_FINISHED_SUCCESS(318), User u0045469 successfully attached to VM stud-c7-1 in VM Pool stud-c7-?.
2019-05-14 13:26:50,584+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_CONSOLE_DISCONNECTED(168), User <UNKNOWN> got disconnected from VM stud-c7-1.
2019-05-14 13:26:50,585+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-2) [] VM 'd8a99676-d520-425e-9974-1b1efe6da8a5'(stud-c7-1) moved from 'WaitForLaunch' --> 'PoweringUp'


The first lock (on the pool itself) is never released. It seems this lock should also be released? Maybe it's this lock that should be released the second time (instead of the second lock twice)?



When we start to launch another instance of the pool it fails:

2019-05-14 13:49:32,656+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='d8a99676-d520-425e-9974-1b1efe6da8a5'}), log id: 7db2f4fc
2019-05-14 13:49:32,656+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 7db2f4fc
2019-05-14 13:49:32,688+02 INFO  [org.ovirt.engine.core.bll.VmPoolHandler] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[d8a99676-d520-425e-9974-1b1efe6da8a5=VM]', sharedLocks=''}'
2019-05-14 13:49:32,700+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] Lock Acquired to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'
2019-05-14 13:49:32,700+02 WARN  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] Validation of action 'AttachUserToVmFromPoolAndRun' failed for user u0045469.be-authz. Reasons: VAR__ACTION__ALLOCATE_AND_RUN,VAR__TYPE__VM_FROM_VM_POOL,ACTION_TYPE_FAILED_NO_AVAILABLE_POOL_VMS
2019-05-14 13:49:32,700+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] Lock freed to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'
2019-05-14 13:49:32,706+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-11) [] Operation Failed: [Cannot allocate and run VM from VM-Pool. There are no available VMs in the VM-Pool.]


I'm not sure this is the same bug as initially reported by now. But it does completely break the pool usage.

Regards,
Rik

Comment 10 Ryan Barry 2019-05-15 03:01:07 UTC
Shmuel, any thoughts?

Comment 11 Michal Skrivanek 2020-03-19 15:40:33 UTC
We didn't get to this bug for more than 2 years, and it's not being considered for the upcoming 4.4. It's unlikely that it will ever be addressed so I'm suggesting to close it.
If you feel this needs to be addressed and want to work on it please remove cond nack and target accordingly.

Comment 12 Michal Skrivanek 2020-04-01 14:46:22 UTC
Closing old bug. Please reopen if still relevant/you want to work on it.


Note You need to log in before you can comment on or make changes to this bug.