Bug 2066078 - Remove VM fails - [Cannot remove VM: VM is locked] but the VM is not locked.
Summary: Remove VM fails - [Cannot remove VM: VM is locked] but the VM is not locked.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.5.0
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ovirt-4.5.1
: ---
Assignee: Artiom Divak
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-20 14:31 UTC by Evelina Shames
Modified: 2022-06-23 05:57 UTC (History)
3 users (show)

Fixed In Version: ovirt-engine-4.5.1
Clone Of:
Environment:
Last Closed: 2022-06-19 16:40:24 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.5?


Attachments (Terms of Use)
logs (1.24 MB, application/zip)
2022-03-20 14:31 UTC, Evelina Shames
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 383 0 None open Change conflict message to be more informative 2022-05-23 16:54:32 UTC
Red Hat Issue Tracker RHV-45363 0 None None None 2022-03-20 14:33:54 UTC

Description Evelina Shames 2022-03-20 14:31:57 UTC
Created attachment 1866964 [details]
logs

Description of problem:
VM removal fails with the following error:
	Status: 409
	Reason: Conflict
	Detail: [Cannot remove VM: VM is locked. Please try again in a few minutes.]

But the VM is not locked (<status>down</status>) and all its disks in state OK and there is no running operation.


Seen as part of automation run with a TC where a snapshot is been taken and then download snapshot disks operation is running on the SPM host using the SDK example script: download_disk_snapshot.py


Version-Release number of selected component (if applicable):
Seen both on 4.4 and 4.5 for a while now.
At the beginning we thought it is an automation issue, but after a deep investigation, it looks like a bug.

How reproducible:
Barly, sometimes it is reproducedd by running the TC locally on my 4.4.10-8 nested env on iSCSI.

Steps to Reproduce:
1. Clone VM from template with disks permutations including FS:
virtio_scsicow, virtio_scsiraw, virtioraw, virtiocow
2. Make a snapshot of the VM
3. Using the SDK example script, download the snapshot disks on the SPM host:
 a. TC1: with finalization of the transfer
 b. TC2: Without finalization


Actual results:
The created VM cannot be removed because it is locked, but the state that is shown is not locked - VM is in 'Down' state, no operation is running, disks state is 'OK'.
Need to wait ~1m and then it is possible to remove the VM.

Expected results:
If the VM is really locked:
- The state should change to 'Locked'
- Running operation should appear 
(on UI: no operation is running, VM state is 'Down', disks state is 'OK')
If the VM is not really locked, VM should be removed successfully.

Additional info:
2022-02-03 18:56:45,596 - ThreadPoolExecutor-422_0 - art.ll_lib.vms - INFO - Removing VM 'vm_TestCase27605_0318531950'
2022-02-03 18:56:45,598 - ThreadPoolExecutor-422_0 - vms - INFO - Using Correlation-Id: vms_delete_03aa0907-cc3b-4d7a
2022-02-03 18:56:45,599 - ThreadPoolExecutor-422_0 - vms - DEBUG - DELETE request content is --  url:/ovirt-engine/api/vms/56ef393c-46f3-4019-a860-f65894edb752
2022-02-03 18:56:45,733 - ThreadPoolExecutor-422_0 - core_api - DEBUG - Request DELETE response time: 0.148
2022-02-03 18:56:45,739 - ThreadPoolExecutor-422_0 - vms - DEBUG - Cleaning Correlation-Id: vms_delete_03aa0907-cc3b-4d7a
2022-02-03 18:56:45,742 - ThreadPoolExecutor-422_0 - api_utils - ERROR - Failed to delete element NOT as expected:
	Status: 409
	Reason: Conflict
	Detail: [Cannot remove VM: VM is locked. Please try again in a few minutes.]

engine, vdsm and art logs are attached.

Comment 1 Arik 2022-03-21 12:28:52 UTC
(In reply to Evelina Shames from comment #0)
> Description of problem:
> VM removal fails with the following error:
> 	Status: 409
> 	Reason: Conflict
> 	Detail: [Cannot remove VM: VM is locked. Please try again in a few minutes.]
> 
> But the VM is not locked (<status>down</status>) and all its disks in state
> OK and there is no running operation.

Yes, this means the VM was locked in-memory
We can see in the engine.log which operation did it

Comment 2 Arik 2022-05-08 21:20:18 UTC
(In reply to Evelina Shames from comment #0)
> Actual results:
> The created VM cannot be removed because it is locked, but the state that is
> shown is not locked - VM is in 'Down' state, no operation is running, disks
> state is 'OK'.
> Need to wait ~1m and then it is possible to remove the VM.

The VM is locked when remove-VM is triggered.
There seem to be four TransferDiskImage commands that locked the VM with a shared lock prior to the call to remove VM:

2022-02-03 18:56:15,386+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-6) [fe424541-ad96-4730-bd7b-3f3494337d41] Lock Acquired to object 'EngineLock:{exclusiveLocks='[b14bb731-32d5-480a-9fcb-81f9171f5596=DISK]', sharedLocks='[56ef393c-46f3-4019-a860-f65894edb752=VM]'}'

2022-02-03 18:56:15,395+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-1) [34571abf-8b69-4711-9bf4-047c2fc2e300] Lock Acquired to object 'EngineLock:{exclusiveLocks='[15a8e2aa-183b-45cf-b8db-091d9a7cf22e=DISK]', sharedLocks='[56ef393c-46f3-4019-a860-f65894edb752=VM]'}'

2022-02-03 18:56:15,411+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-5) [cab547fd-4f85-4662-a452-b4dfd21d3449] Lock Acquired to object 'EngineLock:{exclusiveLocks='[6b82f054-3646-45ec-a102-066ee612ee0a=DISK]', sharedLocks='[56ef393c-46f3-4019-a860-f65894edb752=VM]'}'

2022-02-03 18:56:15,477+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-8) [b5df4559-8f29-4e34-9e4c-e773c93874a1] Lock Acquired to object 'EngineLock:{exclusiveLocks='[ef107573-76b0-4aed-83f0-752f87fedc60=DISK]', sharedLocks='[56ef393c-46f3-4019-a860-f65894edb752=VM]'}'

Therefore RemoveVM fails to lock the VM exclusively:

2022-02-03 18:56:45,725+02 INFO  [org.ovirt.engine.core.bll.RemoveVmCommand] (default task-5) [vms_delete_03aa0907-cc3b-4d7a] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[56ef393c-46f3-4019-a860-f65894edb752=VM]', sharedLocks=''}'

The aforementioned locks are released a bit later on:

2022-02-03 18:57:01,173+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-76) [fe424541-ad96-4730-bd7b-3f3494337d41] Lock freed to object 'EngineLock:{exclusiveLocks='[b14bb731-32d5-480a-9fcb-81f9171f5596=DISK]', sharedLocks='[56ef393c-46f3-4019-a860-f65894edb752=VM]'}'

2022-02-03 18:57:01,192+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-76) [b5df4559-8f29-4e34-9e4c-e773c93874a1] Lock freed to object 'EngineLock:{exclusiveLocks='[ef107573-76b0-4aed-83f0-752f87fedc60=DISK]', sharedLocks='[56ef393c-46f3-4019-a860-f65894edb752=VM]'}'

2022-02-03 18:57:44,709+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-23) [cab547fd-4f85-4662-a452-b4dfd21d3449] Lock freed to object 'EngineLock:{exclusiveLocks='[6b82f054-3646-45ec-a102-066ee612ee0a=DISK]', sharedLocks='[56ef393c-46f3-4019-a860-f65894edb752=VM]'}'

2022-02-03 18:57:44,731+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-23) [34571abf-8b69-4711-9bf4-047c2fc2e300] Lock freed to object 'EngineLock:{exclusiveLocks='[15a8e2aa-183b-45cf-b8db-091d9a7cf22e=DISK]', sharedLocks='[56ef393c-46f3-4019-a860-f65894edb752=VM]'}'

This aligns with the observation that it is possible to remove the VM after waiting a bit.
 
> Expected results:
> If the VM is really locked:
> - The state should change to 'Locked'

We intentionally don't change the status when it comes to shared (non-exclusive) locking - imagine that one of the transfer commands would have changed the status of the VM, how would another transfer command know that it's locked with a shared lock? by design we use in-memory locking for this and keep the status of the VM as 'Down'

> - Running operation should appear 
> (on UI: no operation is running, VM state is 'Down', disks state is 'OK')

It was discussed in the past and we didn't find a good way to reflect that a VM is locked using a shared lock (this is complicated - in this situation commands like RemoveVM that try to lock the VM exclusively are blocked but other commands that try to lock using a shared lock would succeed).

I think that what we're missing here is a lock message that would be associated with the lock that TransferDiskImage acquires so when other commands fail to lock the VM, there would be an appropriate message to display.

Comment 3 Sandro Bonazzola 2022-06-23 05:57:04 UTC
This bugzilla is included in oVirt 4.5.1 release, published on June 22nd 2022.
Since the problem described in this bug report should be resolved in oVirt 4.5.1 release, it has been closed with a resolution of CURRENT RELEASE.
If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.