Bug 1003657 - SPM selection doesn't work if SPM host is Non Responsive
Summary: SPM selection doesn't work if SPM host is Non Responsive
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.3.4
Assignee: Martin Perina
QA Contact:
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-02 14:59 UTC by Martin Perina
Modified: 2016-02-10 19:31 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-03 13:46:53 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
Logs and Screenshot (1.06 MB, application/x-compressed-tar)
2013-09-02 14:59 UTC, Martin Perina
no flags Details

Description Martin Perina 2013-09-02 14:59:24 UTC
Created attachment 792893 [details]
Logs and Screenshot

Description of problem:

If you try put SPM host into Maintenance and at the same time network connection to this host is lost, SPM elections will never end, data center will become also Non Responsive and the only option how to deal with it is to restore network connection to the host

Version-Release number of selected component (if applicable):

Engine: ovirt-engine-3.3.0-0.7.rc2.fc19.noarch running on F19
Host dev-18: vdsm-4.12.1-1.el6.x86_64 running on RHEL 6.4
Host dev-21: vdsm-4.12.1-1.el6.x86_64 running on RHEL 6.4

How reproducible:

100%

Steps to Reproduce:
1.Block network connections between SPM host (host dev-18 in attached logs) and engine
2.Try to put SPM host to Maintenance (this step has to be executed before engine recognizes that network connection to host is not available)

Actual results:

SPM function is not transferred from Non Responsive host to another host, so whole data center is Non Responsive and user cannot solve this in other way than to restore network connection to the Non Responsive host

Expected results:

Another host will be selected as SPM

Additional info:

Comment 1 Ayal Baron 2013-09-03 13:46:53 UTC
Since host may have storage connection still intact it is not safe to move spm role to another host without knowing the status on the original host.
There are 2 ways of verifying the status:
1. if we have network connectivity, just query the host
2. if we don't have network then fence the host.
Fencing isn't automatic unless specifically configured to be and even still it requires a fencing card (management ip).  If you do not have this configured you can right click the host and specify manually that it has been rebooted (effectively telling oVirt 'trust me, I've rebooted the host and it's safe to transfer the spm'.

Please try this (confirm host has been rebooted).
If it doesn't work, feel free to reopen the bug.

Comment 2 Martin Perina 2013-09-04 13:00:02 UTC
Thanks Ayal, this option didn't come to my mind. Executing "Confirm host has been rebooted" helped and other host in cluster became SPM almost at once.


Note You need to log in before you can comment on or make changes to this bug.