Bug 1059129

Summary: Resource lock split brain causes VM to get paused after migration
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: vdsmAssignee: Vinzenz Feenstra [evilissimo] <vfeenstr>
Status: CLOSED ERRATA QA Contact: Pavel Novotny <pnovotny>
Severity: high Docs Contact:
Priority: high    
Version: 3.2.0CC: abaron, bazulay, danken, iheim, lpeer, lyarwood, mavital, michal.skrivanek, pablo.iranzo, pep, sherold, tdosek, yeylon, zdover
Target Milestone: ---Keywords: ZStream
Target Release: 3.3.1   
Hardware: All   
OS: All   
Whiteboard: virt
Fixed In Version: is34 Doc Type: Bug Fix
Doc Text:
Virtual machines are no longer paused after migrations; hosts now correctly acquire resource locks for recently migrated virtual machines.
Story Points: ---
Clone Of: 1028917 Environment:
Last Closed: 2014-02-27 09:43:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1028917    
Bug Blocks:    

Comment 1 Vinzenz Feenstra [evilissimo] 2014-02-03 16:14:49 UTC
Merged u/s to ovirt-3.3 as http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit;h=9369b370369057832eff41793075fc1a63c42279

Comment 3 Pavel Novotny 2014-02-11 21:53:10 UTC
Verified in vdsm-4.13.2-0.8.el6ev.x86_64 (is34).

Verification steps:
1. Preparation: On destination migration host, set 'migration_destination_timeout' to '120' in VDSM config.py (located at /usr/lib64/python2.6/site-packages/vdsm/config.py).
   This reduces the verification time, otherwise the default is 6 hours.
2. Have a running VM (F19 in my case) with some ongoing memory-stressing operation (I used `memtester` utility). This should make the migration process long enough to give us time in step 3 to simulate the error-prone environment.
2. Migrate the VM from source host1 do destination host2. 
3. Immediately after migration starts, block on the source host1:
  - connection to destination host VDSM (simulating connection loss to dest. VDSM)
  `iptables -I OUTPUT 1 -p tcp -d <host2> --dport 54321 -j DROP`
  - connection to the storage (simulating migration error)
  `iptables -I OUTPUT 1 -d <storage> -j DROP`
4. Wait `migration_destination_timeout` seconds (120).

Results:
The migration fails (due to our blocking of storage) and is aborted.
On destination host, the migrating VM is destroyed (the host shows 0 running VMs and no VM migrating).
The VM stays on the source host (paused due to inaccessible storage; after unblocking the storage the VM should run as if nothing happened). 
The source host shows 1 running VM and no VM migrating.

Comment 5 errata-xmlrpc 2014-02-27 09:43:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0219.html