838206 – oVirt manually shuting down the spm host will take the entire data center down.

Bug 838206 - oVirt manually shuting down the spm host will take the entire data center down.

Summary: oVirt manually shuting down the spm host will take the entire data center down.

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	oVirt
Classification:	Retired
Component:	ovirt-engine-core
Sub Component:
Version:	3.1 RC
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	3.3.4
Assignee:	Nobody's working on this, feel free to take it
QA Contact:
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-07-07 07:16 UTC by Robert Middleswarth
Modified:	2016-02-10 16:38 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-03-11 21:58:28 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Robert Middleswarth 2012-07-07 07:16:37 UTC

Description of problem:
I came across this by accendent I shutdown one of my host by mistake.  It turns out it was also the SPM host and it took down the entire network.  I was able to recover the datacenter / cluster but it was a lot of downtime.  This could be an issue in a production network.

Version-Release number of selected component (if applicable):
oVirt Engine Version: 3.1.0-3.9.el6 
vdsm-cli: 4.10.0-0.58.gita6f4929.el6


How reproducible:
I have done it 3 times in my test network.

Steps to Reproduce:
1. Build a 3 node network not sure if it makes a diff I am using glusterfs
2. manually shutdown the Current SPM node.
3. Watch as everything crashes.
  
Actual results:
The primarary data store goes down taking the entire data center down at the same time.

Expected results:
All the VM's move and the data store continues to runs in degraded state 

Additional info:
Not sure if this an engine ore vdsm issue and what logs are needed.  Since it is very repeatable were do you want me to do to help debug this very problematic issue.

Thanks
Robert

Comment 1 Itamar Heim 2012-07-07 13:01:17 UTC

type of data center (if posixfs/gluster, i assume duplicate of the can't elect spm bug)?
if not, logs...

Comment 2 Robert Middleswarth 2012-07-07 17:32:29 UTC

It is NFS.  Later today I will route out the logs and then force the event to happen again to generate logs.

Thanks
Robert

Comment 3 Itamar Heim 2013-03-11 21:58:28 UTC

Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.

Note You need to log in before you can comment on or make changes to this bug.