Bug 1432105 - Can't remove vm pool because of vm deadlock
Summary: Can't remove vm pool because of vm deadlock
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.1.1
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ovirt-4.1.3
: 4.1.3.2
Assignee: Shmuel Melamud
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-14 14:21 UTC by Aleksei Slaikovskii
Modified: 2017-07-14 03:46 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-07-06 13:14:03 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.1+


Attachments (Terms of Use)
engine.log (7.33 KB, text/plain)
2017-03-15 11:56 UTC, Aleksei Slaikovskii
no flags Details
reassigned, vdsm.log (421.18 KB, application/x-xz)
2017-06-01 14:44 UTC, Nisim Simsolo
no flags Details
reassigned, engine.log (126.51 KB, application/x-xz)
2017-06-01 14:45 UTC, Nisim Simsolo
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 75508 0 master MERGED core: Free VM lock if attaching a VM from pool failed 2020-08-07 15:27:50 UTC
oVirt gerrit 76246 0 ovirt-engine-4.1 MERGED core: Free VM lock if attaching a VM from pool failed 2020-08-07 15:27:50 UTC
oVirt gerrit 77820 0 master MERGED core: Free pooled VM lock independently from context locks 2020-08-07 15:27:50 UTC
oVirt gerrit 77891 0 ovirt-engine-4.1 MERGED core: Free pooled VM lock independently from context locks 2020-08-07 15:27:50 UTC

Description Aleksei Slaikovskii 2017-03-14 14:21:57 UTC
Description of problem:
Virtual machine of virtual machines pool get's locked forever.

How reproducible:
100%

Steps to Reproduce:
1. create some vm pool
2. create some user
3. give this user permissions of some role which doesn't have vm_pool_basic_operations permit (for example BookmarkManager) vm pool created above.
4. login as this user
5. try to allocate pool synchronously e.g.
curl -k -u user1@internal:123456 -H "Content-Type: application/xml" -d "<action><async>false</async><grace_period><expiry>10</expiry></grace_period></action>" https://engine/ovirt-engine/api/vmpools/123/allocatevm

6. get error
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<action>
    <async>false</async>
    <fault>
        <detail>[User is not authorized to perform this action.]</detail>
        <reason>Operation Failed</reason>
    </fault>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
    <status>failed</status>
</action>

7. try to remove pool as admin then
curl -k -u admin@internal:123456 -X DELETE https://engine/ovirt-engine/api/vmpools/123
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fault>
    <detail>[Cannot ${action} ${type}. Related operation is currently in progress. Please try again later.]</detail>
    <reason>Operation Failed</reason>
</fault>


Actual results:
Engine locks vm object forever.

Expected results:
There's no deadlock.

Comment 1 Yaniv Kaul 2017-03-15 07:36:20 UTC
Logs?

Comment 2 Aleksei Slaikovskii 2017-03-15 11:56:31 UTC
Created attachment 1263288 [details]
engine.log

Comment 3 Nisim Simsolo 2017-06-01 14:40:22 UTC
Reassigned, following bug reproduction steps and then trying to remove pool failed with engine.log: 

2017-06-01 17:32:40,215+03 INFO  [org.ovirt.engine.core.bll.RemoveVmPoolCommand] (default task-6) [ce7d7218-922d-4aee-9084-7c2adb99062a] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[bc304503-dee7-42d4-9b8f-418069899114=<VM, ACTION_TYPE_FAILED_VM_POOL_IS_BEING_REMOVED_WITH_VM$VmPoolName 2pool$VmName 2pool-4>, 9bc48be7-3754-4dd5-b632-17fd5f202716=<VM, ACTION_TYPE_FAILED_VM_POOL_IS_BEING_REMOVED_WITH_VM$VmPoolName 2pool$VmName 2pool-2>, 70e3b3da-4c35-4f3e-ac53-84e1da3099c9=<VM, ACTION_TYPE_FAILED_VM_POOL_IS_BEING_REMOVED_WITH_VM$VmPoolName 2pool$VmName 2pool-3>, 080b84ea-4bad-4468-b516-8c022fd9ab2e=<VM, ACTION_TYPE_FAILED_VM_POOL_IS_BEING_REMOVED_WITH_VM$VmPoolName 2pool$VmName 2pool-1>, 648bbc82-b1bb-4571-b4f0-ceec9b84ade0=<VM, ACTION_TYPE_FAILED_VM_POOL_IS_BEING_REMOVED_WITH_VM$VmPoolName 2pool$VmName 2pool-5>, db4aa3be-dc54-4e01-a2bb-36bc84251a16=<VM_POOL, ACTION_TYPE_FAILED_VM_POOL_IS_BEING_REMOVED$VmPoolName 2pool>]', sharedLocks='null'}'
2017-06-01 17:32:40,216+03 WARN  [org.ovirt.engine.core.bll.RemoveVmPoolCommand] (default task-6) [ce7d7218-922d-4aee-9084-7c2adb99062a] Validation of action 'RemoveVmPool' failed for user admin@internal-authz. Reasons: VAR__ACTION__REMOVE,VAR__TYPE__DESKTOP_POOL,ACTION_TYPE_FAILED_OBJECT_LOCKED

Detaching pool VMs and then removing all VMs at once may also cause to locked VM object.

Verification builds: 
ovirt-engine-4.1.3-0.1.el7
vdsm-4.19.16-1.el7ev.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.10.x86_64
sanlock-3.4.0-1.el7.x86_64
libvirt-client-2.0.0-10.el7_3.9.x86_64

engine.log and vdsm.log attached.

Comment 4 Nisim Simsolo 2017-06-01 14:44:57 UTC
Created attachment 1284166 [details]
reassigned, vdsm.log

Comment 5 Nisim Simsolo 2017-06-01 14:45:22 UTC
Created attachment 1284168 [details]
reassigned, engine.log

Comment 6 Nisim Simsolo 2017-06-15 12:38:34 UTC
Verification builds:
ovirt-engine-4.1.3.2-0.1.el7
sanlock-3.5.0-1.el7.x86_64
vdsm-4.19.18-1.el7ev.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.10.x86_64
libvirt-client-3.2.0-10.el7.x86_64

Verification scenario:
1. Create pool
2. Create user with BookmarkManager permissions
3. Login with new user
4. Try to allocate pool synchronously e.g.
[root@intel-vfio ~]# curl -k -u user1@internal:xxxxxx -H "Content-Type: application/xml" -d "<action><async>false</async><grace_period><expiry>10</expiry></grace_period></action>" https://engine_name.some.lab.tlv.redhat.com/ovirt-engine/api/vmpools/056ce838-d742-4420-a7ee-255047533bc0/allocatevm
Get error:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<action>
    <async>false</async>
    <fault>
        <detail>[User is not authorized to perform this action.]</detail>
        <reason>Operation Failed</reason>
    </fault>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
    <status>failed</status>
</action>
[root@intel-vfio ~]# 

5. Try to remove pool as admin: 
[root@intel-vfio ~]# curl -k -u admin@internal:xxxxxx -X DELETE https://engine_name.some.lab.tlv.redhat.com/ovirt-engine/api/vmpools/056ce838-d742-4420-a7ee-255047533bc0
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<action>
    <job href="/ovirt-engine/api/jobs/93de10db-21df-40a1-854d-88f791bed5c2" id="93de10db-21df-40a1-854d-88f791bed5c2"/>
    <status>complete</status>
</action>
[root@intel-vfio ~]# 

6. Verify engine is not locking VM objects (pool and related VMs removed)


Note You need to log in before you can comment on or make changes to this bug.