Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1104030 - Failed VM migrations do not release VM resource lock properly leading to failures in subsequent migration attempts
Failed VM migrations do not release VM resource lock properly leading to fail...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.3.0
Unspecified Unspecified
unspecified Severity high
: ---
: 3.5.0
Assigned To: Arik
Nisim Simsolo
virt
: ZStream
Depends On:
Blocks: 1114587 rhev3.5beta 1156165
  Show dependency treegraph
 
Reported: 2014-06-03 02:40 EDT by Aval
Modified: 2015-06-19 01:15 EDT (History)
10 users (show)

See Also:
Fixed In Version: ovirt-engine-3.5.0_beta
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1114587 (view as bug list)
Environment:
Last Closed: 2015-02-11 13:03:16 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
engine.log (870.63 KB, text/x-log)
2014-06-03 03:18 EDT, Aval
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 28403 master MERGED core: fix reattempt to go to maintenance mechanism Never
Red Hat Product Errata RHSA-2015:0158 normal SHIPPED_LIVE Important: Red Hat Enterprise Virtualization Manager 3.5.0 2015-02-11 17:38:50 EST

  None (edit)
Description Aval 2014-06-03 02:40:27 EDT
Description of problem: After VM migration thread acquires resource lock and fails due to some reason, it does not release the lock properly and later when other threads try to perform operations like migrate/Run VM they fail while acquiring lock with below error

~~~
2014-06-01 18:15:53,807 INFO  [org.ovirt.engine.core.bll.InternalMigrateVmCommand] (DefaultQuartzScheduler_Worker-3) [36055ac9] Failed to Acquire Lock to object EngineLock [exclusiveLocks= key: 40abfe32-96de-44a7-abd9-e77ecd2bec7b value: VM
, sharedLocks= ]
2014-06-01 18:15:53,808 WARN  [org.ovirt.engine.core.bll.InternalMigrateVmCommand] (DefaultQuartzScheduler_Worker-3) [36055ac9] CanDoAction of action InternalMigrateVm failed. Reasons:VAR__ACTION__MIGRATE,VAR__TYPE__VM,ACTION_TYPE_FAILED_VM_IS_BEING_MIGRATED,$VmName zabbix-190
~~~


Version-Release number of selected component (if applicable):

rhevm-3.3.3-0.52.el6ev.noarch  

How reproducible:

Couple of customers reported the problem that VM migrations keep on failing while putting host in maintenance and later they were not able to start/stop VMs as well.

Steps to Reproduce:
1.
2.
3.

Actual results:
After first failure , subsequent attempts to run/migrate fail while acquiring locks

Expected results:
After first failure , subsequent attempts to run/migrate should get resource locks.

Additional info:

After restarting ovirt-engine service , they one of the customer was able to start/migrate VMs
Comment 2 Aval 2014-06-03 03:18:38 EDT
Created attachment 901619 [details]
engine.log

Adding engine.log from one of the customer facing this problem.
Comment 3 Arik 2014-06-08 08:38:53 EDT
This bug exposed several issues:

1. The first migrations failed not because the VMs remained locked, but because of a bug which caused 'switch host to maintenance' to be re-triggered too soon, so in the retry attempt the migrations fail since the VMs are already locked. This one is solved by http://gerrit.ovirt.org/#/c/28403

2. NPEs in the migrate operations. These exceptions in migrations which were triggered by 'switch to maintenance' operation were already solved by: http://gerrit.ovirt.org/#/c/24639

3. Migrate transaction was aborted, thus the migrate operation failed (on 2014-06-01 18:05:47,045). It won't happen anymore as the migrate operation is no longer transactive.

4. Eventually the VMs remain locked. It happens after the maximum number of retries to migrate the VM is reached (and the migration fails). It was fixed in 3.4. Patch for 3.3:  http://gerrit.ovirt.org/#/c/28460
Comment 7 Nisim Simsolo 2014-10-26 08:27:26 EDT
Fixed. Verified using the next builds:
rhevm-3.5.0-0.17.beta.el6ev.noarch
libvirt-0.10.2-46.el6.x86_64
vdsm-4.16.7.1-1.el6ev.x86_64
sanlock-2.8-1.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64
Comment 9 errata-xmlrpc 2015-02-11 13:03:16 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0158.html

Note You need to log in before you can comment on or make changes to this bug.