Bug 1003657 - SPM selection doesn't work if SPM host is Non Responsive
SPM selection doesn't work if SPM host is Non Responsive
Status: CLOSED NOTABUG
Product: oVirt
Classification: Community
Component: ovirt-engine-core (Show other bugs)
3.3
Unspecified Unspecified
unspecified Severity unspecified
: ---
: 3.3.4
Assigned To: Martin Perina
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-02 10:59 EDT by Martin Perina
Modified: 2016-02-10 14:31 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-03 09:46:53 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Logs and Screenshot (1.06 MB, application/x-compressed-tar)
2013-09-02 10:59 EDT, Martin Perina
no flags Details

  None (edit)
Description Martin Perina 2013-09-02 10:59:24 EDT
Created attachment 792893 [details]
Logs and Screenshot

Description of problem:

If you try put SPM host into Maintenance and at the same time network connection to this host is lost, SPM elections will never end, data center will become also Non Responsive and the only option how to deal with it is to restore network connection to the host

Version-Release number of selected component (if applicable):

Engine: ovirt-engine-3.3.0-0.7.rc2.fc19.noarch running on F19
Host dev-18: vdsm-4.12.1-1.el6.x86_64 running on RHEL 6.4
Host dev-21: vdsm-4.12.1-1.el6.x86_64 running on RHEL 6.4

How reproducible:

100%

Steps to Reproduce:
1.Block network connections between SPM host (host dev-18 in attached logs) and engine
2.Try to put SPM host to Maintenance (this step has to be executed before engine recognizes that network connection to host is not available)

Actual results:

SPM function is not transferred from Non Responsive host to another host, so whole data center is Non Responsive and user cannot solve this in other way than to restore network connection to the Non Responsive host

Expected results:

Another host will be selected as SPM

Additional info:
Comment 1 Ayal Baron 2013-09-03 09:46:53 EDT
Since host may have storage connection still intact it is not safe to move spm role to another host without knowing the status on the original host.
There are 2 ways of verifying the status:
1. if we have network connectivity, just query the host
2. if we don't have network then fence the host.
Fencing isn't automatic unless specifically configured to be and even still it requires a fencing card (management ip).  If you do not have this configured you can right click the host and specify manually that it has been rebooted (effectively telling oVirt 'trust me, I've rebooted the host and it's safe to transfer the spm'.

Please try this (confirm host has been rebooted).
If it doesn't work, feel free to reopen the bug.
Comment 2 Martin Perina 2013-09-04 09:00:02 EDT
Thanks Ayal, this option didn't come to my mind. Executing "Confirm host has been rebooted" helped and other host in cluster became SPM almost at once.

Note You need to log in before you can comment on or make changes to this bug.