Bug 689921

Summary: [vdsm][scale] Split-brain during failed migration
Product: Red Hat Enterprise Linux 6 Reporter: David Naori <dnaori>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED DUPLICATE QA Contact: yeylon <yeylon>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: abaron, bazulay, hateya, iheim, mgoldboi, srevivo, syeghiay, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-28 15:48:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm source and destenation logs none

Description David Naori 2011-03-22 19:37:40 UTC
Created attachment 486882 [details]
vdsm source and destenation logs

Description of problem:

during scale testing, *failed migration of a vm to destination host that runs 90 vms, finish with split brain. 

* Failed migration - libvirt daemon was restarted during that operation on source host. 

Flow: 

- destination host runs 90 vms 
- call migrate on source - migration starts
- 5-10 seconds later - /etc/init.d/libivrtd restart 
- connection to libvirt is broken - vdsm takes it self down 
- on destination server, call destroy is initiated, but fails due to operation time-out, vm runs on destination server in paused state.
- on source server, vdsm performs vm recovery, and runs the vm. 

result: split brain. 

vdsClient -s 0 list table:

destination host:
4afaa2bb-d570-4ae4-964a-f19507ca786b  73182  FC-0-30              Paused*

source host:
4afaa2bb-d570-4ae4-964a-f19507ca786b  79552  FC-0-30              Up


Versions:
-vdsm-4.9-55.el6.x86_64
-libvirt-0.8.7-13.el6.x86_64

attached source and destination vdsm logs.

Comment 1 Dan Kenigsberg 2011-03-28 15:48:18 UTC
Come to think of it, we reflect the state that libvirt reports as of bug 690175, so there's nothing we can do about it here.

*** This bug has been marked as a duplicate of bug 690175 ***