1059129 – Resource lock split brain causes VM to get paused after migration

Bug 1059129 - Resource lock split brain causes VM to get paused after migration

Summary: Resource lock split brain causes VM to get paused after migration

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.2.0
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.3.1
Assignee:	Vinzenz Feenstra [evilissimo]
QA Contact:	Pavel Novotny
Docs Contact:
URL:
Whiteboard:	virt
Depends On:	1028917
Blocks:
TreeView+	depends on / blocked

Reported:	2014-01-29 08:55 UTC by rhev-integ
Modified:	2019-04-28 09:25 UTC (History)
CC List:	14 users (show)
Fixed In Version:	is34
Doc Type:	Bug Fix
Doc Text:	Virtual machines are no longer paused after migrations; hosts now correctly acquire resource locks for recently migrated virtual machines.
Clone Of:	1028917
Environment:
Last Closed:	2014-02-27 09:43:57 UTC
oVirt Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:0219	normal	SHIPPED_LIVE	vdsm 3.3.1 bug fix update	2014-02-27 14:42:16 UTC
oVirt gerrit	21963	None	None	None	Never
oVirt gerrit	23939	None	None	None	Never

Comment 1 Vinzenz Feenstra [evilissimo] 2014-02-03 16:14:49 UTC

Merged u/s to ovirt-3.3 as http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit;h=9369b370369057832eff41793075fc1a63c42279

Comment 3 Pavel Novotny 2014-02-11 21:53:10 UTC

Verified in vdsm-4.13.2-0.8.el6ev.x86_64 (is34).

Verification steps:
1. Preparation: On destination migration host, set 'migration_destination_timeout' to '120' in VDSM config.py (located at /usr/lib64/python2.6/site-packages/vdsm/config.py).
   This reduces the verification time, otherwise the default is 6 hours.
2. Have a running VM (F19 in my case) with some ongoing memory-stressing operation (I used `memtester` utility). This should make the migration process long enough to give us time in step 3 to simulate the error-prone environment.
2. Migrate the VM from source host1 do destination host2. 
3. Immediately after migration starts, block on the source host1:
  - connection to destination host VDSM (simulating connection loss to dest. VDSM)
  `iptables -I OUTPUT 1 -p tcp -d <host2> --dport 54321 -j DROP`
  - connection to the storage (simulating migration error)
  `iptables -I OUTPUT 1 -d <storage> -j DROP`
4. Wait `migration_destination_timeout` seconds (120).

Results:
The migration fails (due to our blocking of storage) and is aborted.
On destination host, the migrating VM is destroyed (the host shows 0 running VMs and no VM migrating).
The VM stays on the source host (paused due to inaccessible storage; after unblocking the storage the VM should run as if nothing happened). 
The source host shows 1 running VM and no VM migrating.

Comment 5 errata-xmlrpc 2014-02-27 09:43:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0219.html

Note You need to log in before you can comment on or make changes to this bug.