Bug 1092631 - failure to recover after executing fenceSpmStorage
Summary: failure to recover after executing fenceSpmStorage
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.4.0
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
: 3.4.0
Assignee: Liron Aravot
QA Contact: Ori Gofen
URL:
Whiteboard: storage
Depends On:
Blocks: 1082365
TreeView+ depends on / blocked
 
Reported: 2014-04-29 15:32 UTC by Liron Aravot
Modified: 2016-05-26 01:48 UTC (History)
15 users (show)

Fixed In Version: vdsm-4.14.7-1.el6ev
Doc Type: Bug Fix
Doc Text:
Previously, the role of storage pool master would not be transferred to another host when a host designated as the storage pool master was manually fenced. This was caused by an error in the logic used to indicate that the role of storage pool master is free when a host to which that role has been assigned is manually fenced. This logic has now been revised so that manually fencing a host designated as a storage pool manager will transfer the role of storage pool manager to another host.
Clone Of: 1082365
Environment:
Last Closed: 2014-06-09 13:30:29 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0504 0 normal SHIPPED_LIVE vdsm 3.4.0 bug fix and enhancement update 2014-06-09 17:21:35 UTC
oVirt gerrit 27226 0 None None None Never
oVirt gerrit 27340 0 None None None Never

Description Liron Aravot 2014-04-29 15:32:15 UTC
Description of problem:

After manually fencing spm host ("confirm host has been rebooted button") the system won't start the spm on other host.


Version-Release number of selected component (if applicable):

- 3.4.0

How reproducible:
Always

Steps to Reproduce:

- 2 Node Cluster -SPM/HSM
- block spm host network
- host becomes non response
- click on "confirm host has been rebooted" button
- other host isn't being selected as the spm

Actual results:

- other host isn't being selected as the spm

Expected results:

- spm should be started on the other host

Comment 1 Liron Aravot 2014-04-29 15:36:26 UTC
I'll add logs -
basically the issue seems that when we "fence" on that scenario, the pool metadata is being updated with spmId = -1 and lver = -1.
The problem is that when the engine runs getSpmStatus the stats are retrieved from sanlock that weren't updated and contains the previous spm id/lver.

this bug https://bugzilla.redhat.com/show_bug.cgi?id=1082365 are kind of blocking each other , 1082365 can't be "completely" verified with succesful "fence" while this bug can't be solved without 1082365 attribute errors fixed.
This bug was opened for sanity and testing of the complete scenario.

Comment 3 Ori Gofen 2014-05-12 15:34:01 UTC
verified on av9 step taken:

1.create 2 Node Cluster -SPM/HSM (shared dc)
2.block spm host network
3.wait for host to become non response
4.stop vdsmd service on blocked spm 
5.click on "confirm host has been rebooted" button

HSM gains SPM as expected

Comment 4 errata-xmlrpc 2014-06-09 13:30:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0504.html


Note You need to log in before you can comment on or make changes to this bug.