Bug 1003657

Summary: SPM selection doesn't work if SPM host is Non Responsive
Product: [Retired] oVirt Reporter: Martin Perina <mperina>
Component: ovirt-engine-coreAssignee: Martin Perina <mperina>
Status: CLOSED NOTABUG QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.3CC: abaron, acathrow, amureini, iheim, yeylon, yzaslavs
Target Milestone: ---   
Target Release: 3.3.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-03 13:46:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Logs and Screenshot none

Description Martin Perina 2013-09-02 14:59:24 UTC
Created attachment 792893 [details]
Logs and Screenshot

Description of problem:

If you try put SPM host into Maintenance and at the same time network connection to this host is lost, SPM elections will never end, data center will become also Non Responsive and the only option how to deal with it is to restore network connection to the host

Version-Release number of selected component (if applicable):

Engine: ovirt-engine-3.3.0-0.7.rc2.fc19.noarch running on F19
Host dev-18: vdsm-4.12.1-1.el6.x86_64 running on RHEL 6.4
Host dev-21: vdsm-4.12.1-1.el6.x86_64 running on RHEL 6.4

How reproducible:

100%

Steps to Reproduce:
1.Block network connections between SPM host (host dev-18 in attached logs) and engine
2.Try to put SPM host to Maintenance (this step has to be executed before engine recognizes that network connection to host is not available)

Actual results:

SPM function is not transferred from Non Responsive host to another host, so whole data center is Non Responsive and user cannot solve this in other way than to restore network connection to the Non Responsive host

Expected results:

Another host will be selected as SPM

Additional info:

Comment 1 Ayal Baron 2013-09-03 13:46:53 UTC
Since host may have storage connection still intact it is not safe to move spm role to another host without knowing the status on the original host.
There are 2 ways of verifying the status:
1. if we have network connectivity, just query the host
2. if we don't have network then fence the host.
Fencing isn't automatic unless specifically configured to be and even still it requires a fencing card (management ip).  If you do not have this configured you can right click the host and specify manually that it has been rebooted (effectively telling oVirt 'trust me, I've rebooted the host and it's safe to transfer the spm'.

Please try this (confirm host has been rebooted).
If it doesn't work, feel free to reopen the bug.

Comment 2 Martin Perina 2013-09-04 13:00:02 UTC
Thanks Ayal, this option didn't come to my mind. Executing "Confirm host has been rebooted" helped and other host in cluster became SPM almost at once.