Bug 1059129 - Resource lock split brain causes VM to get paused after migration
Summary: Resource lock split brain causes VM to get paused after migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.2.0
Hardware: All
OS: All
high
high
Target Milestone: ---
: 3.3.1
Assignee: Vinzenz Feenstra [evilissimo]
QA Contact: Pavel Novotny
URL:
Whiteboard: virt
Depends On: 1028917
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-29 08:55 UTC by rhev-integ
Modified: 2019-04-28 09:25 UTC (History)
14 users (show)

Fixed In Version: is34
Doc Type: Bug Fix
Doc Text:
Virtual machines are no longer paused after migrations; hosts now correctly acquire resource locks for recently migrated virtual machines.
Clone Of: 1028917
Environment:
Last Closed: 2014-02-27 09:43:57 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0219 0 normal SHIPPED_LIVE vdsm 3.3.1 bug fix update 2014-02-27 14:42:16 UTC
oVirt gerrit 21963 0 None None None Never
oVirt gerrit 23939 0 None None None Never

Comment 1 Vinzenz Feenstra [evilissimo] 2014-02-03 16:14:49 UTC
Merged u/s to ovirt-3.3 as http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit;h=9369b370369057832eff41793075fc1a63c42279

Comment 3 Pavel Novotny 2014-02-11 21:53:10 UTC
Verified in vdsm-4.13.2-0.8.el6ev.x86_64 (is34).

Verification steps:
1. Preparation: On destination migration host, set 'migration_destination_timeout' to '120' in VDSM config.py (located at /usr/lib64/python2.6/site-packages/vdsm/config.py).
   This reduces the verification time, otherwise the default is 6 hours.
2. Have a running VM (F19 in my case) with some ongoing memory-stressing operation (I used `memtester` utility). This should make the migration process long enough to give us time in step 3 to simulate the error-prone environment.
2. Migrate the VM from source host1 do destination host2. 
3. Immediately after migration starts, block on the source host1:
  - connection to destination host VDSM (simulating connection loss to dest. VDSM)
  `iptables -I OUTPUT 1 -p tcp -d <host2> --dport 54321 -j DROP`
  - connection to the storage (simulating migration error)
  `iptables -I OUTPUT 1 -d <storage> -j DROP`
4. Wait `migration_destination_timeout` seconds (120).

Results:
The migration fails (due to our blocking of storage) and is aborted.
On destination host, the migrating VM is destroyed (the host shows 0 running VMs and no VM migrating).
The VM stays on the source host (paused due to inaccessible storage; after unblocking the storage the VM should run as if nothing happened). 
The source host shows 1 running VM and no VM migrating.

Comment 5 errata-xmlrpc 2014-02-27 09:43:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0219.html


Note You need to log in before you can comment on or make changes to this bug.