Bug 1092631

Summary:	failure to recover after executing fenceSpmStorage
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	Liron Aravot <laravot>
Component:	vdsm	Assignee:	Liron Aravot <laravot>
Status:	CLOSED ERRATA	QA Contact:	Ori Gofen <ogofen>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.4.0	CC:	acanan, acathrow, adahms, amureini, bazulay, gklein, iheim, knesenko, laravot, lpeer, md, mgoldboi, scohen, tnisan, yeylon
Target Milestone:	---
Target Release:	3.4.0
Hardware:	All
OS:	Linux
Whiteboard:	storage
Fixed In Version:	vdsm-4.14.7-1.el6ev	Doc Type:	Bug Fix
Doc Text:	Previously, the role of storage pool master would not be transferred to another host when a host designated as the storage pool master was manually fenced. This was caused by an error in the logic used to indicate that the role of storage pool master is free when a host to which that role has been assigned is manually fenced. This logic has now been revised so that manually fencing a host designated as a storage pool manager will transfer the role of storage pool manager to another host.	Story Points:	---
Clone Of:	1082365	Environment:
Last Closed:	2014-06-09 13:30:29 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1082365

Description Liron Aravot 2014-04-29 15:32:15 UTC

Description of problem:

After manually fencing spm host ("confirm host has been rebooted button") the system won't start the spm on other host.


Version-Release number of selected component (if applicable):

- 3.4.0

How reproducible:
Always

Steps to Reproduce:

- 2 Node Cluster -SPM/HSM
- block spm host network
- host becomes non response
- click on "confirm host has been rebooted" button
- other host isn't being selected as the spm

Actual results:

- other host isn't being selected as the spm

Expected results:

- spm should be started on the other host

Comment 1 Liron Aravot 2014-04-29 15:36:26 UTC

I'll add logs -
basically the issue seems that when we "fence" on that scenario, the pool metadata is being updated with spmId = -1 and lver = -1.
The problem is that when the engine runs getSpmStatus the stats are retrieved from sanlock that weren't updated and contains the previous spm id/lver.

this bug https://bugzilla.redhat.com/show_bug.cgi?id=1082365 are kind of blocking each other , 1082365 can't be "completely" verified with succesful "fence" while this bug can't be solved without 1082365 attribute errors fixed.
This bug was opened for sanity and testing of the complete scenario.

Comment 3 Ori Gofen 2014-05-12 15:34:01 UTC

verified on av9 step taken:

1.create 2 Node Cluster -SPM/HSM (shared dc)
2.block spm host network
3.wait for host to become non response
4.stop vdsmd service on blocked spm 
5.click on "confirm host has been rebooted" button

HSM gains SPM as expected

Comment 4 errata-xmlrpc 2014-06-09 13:30:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0504.html