Red Hat Bugzilla – Bug 1003657
SPM selection doesn't work if SPM host is Non Responsive
Last modified: 2016-02-10 14:31:44 EST
Created attachment 792893 [details]
Logs and Screenshot
Description of problem:
If you try put SPM host into Maintenance and at the same time network connection to this host is lost, SPM elections will never end, data center will become also Non Responsive and the only option how to deal with it is to restore network connection to the host
Version-Release number of selected component (if applicable):
Engine: ovirt-engine-3.3.0-0.7.rc2.fc19.noarch running on F19
Host dev-18: vdsm-4.12.1-1.el6.x86_64 running on RHEL 6.4
Host dev-21: vdsm-4.12.1-1.el6.x86_64 running on RHEL 6.4
Steps to Reproduce:
1.Block network connections between SPM host (host dev-18 in attached logs) and engine
2.Try to put SPM host to Maintenance (this step has to be executed before engine recognizes that network connection to host is not available)
SPM function is not transferred from Non Responsive host to another host, so whole data center is Non Responsive and user cannot solve this in other way than to restore network connection to the Non Responsive host
Another host will be selected as SPM
Since host may have storage connection still intact it is not safe to move spm role to another host without knowing the status on the original host.
There are 2 ways of verifying the status:
1. if we have network connectivity, just query the host
2. if we don't have network then fence the host.
Fencing isn't automatic unless specifically configured to be and even still it requires a fencing card (management ip). If you do not have this configured you can right click the host and specify manually that it has been rebooted (effectively telling oVirt 'trust me, I've rebooted the host and it's safe to transfer the spm'.
Please try this (confirm host has been rebooted).
If it doesn't work, feel free to reopen the bug.
Thanks Ayal, this option didn't come to my mind. Executing "Confirm host has been rebooted" helped and other host in cluster became SPM almost at once.