Bug 1003657
Summary: | SPM selection doesn't work if SPM host is Non Responsive | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] oVirt | Reporter: | Martin Perina <mperina> | ||||
Component: | ovirt-engine-core | Assignee: | Martin Perina <mperina> | ||||
Status: | CLOSED NOTABUG | QA Contact: | |||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.3 | CC: | abaron, acathrow, amureini, iheim, yeylon, yzaslavs | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.3.4 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-09-03 13:46:53 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Since host may have storage connection still intact it is not safe to move spm role to another host without knowing the status on the original host. There are 2 ways of verifying the status: 1. if we have network connectivity, just query the host 2. if we don't have network then fence the host. Fencing isn't automatic unless specifically configured to be and even still it requires a fencing card (management ip). If you do not have this configured you can right click the host and specify manually that it has been rebooted (effectively telling oVirt 'trust me, I've rebooted the host and it's safe to transfer the spm'. Please try this (confirm host has been rebooted). If it doesn't work, feel free to reopen the bug. Thanks Ayal, this option didn't come to my mind. Executing "Confirm host has been rebooted" helped and other host in cluster became SPM almost at once. |
Created attachment 792893 [details] Logs and Screenshot Description of problem: If you try put SPM host into Maintenance and at the same time network connection to this host is lost, SPM elections will never end, data center will become also Non Responsive and the only option how to deal with it is to restore network connection to the host Version-Release number of selected component (if applicable): Engine: ovirt-engine-3.3.0-0.7.rc2.fc19.noarch running on F19 Host dev-18: vdsm-4.12.1-1.el6.x86_64 running on RHEL 6.4 Host dev-21: vdsm-4.12.1-1.el6.x86_64 running on RHEL 6.4 How reproducible: 100% Steps to Reproduce: 1.Block network connections between SPM host (host dev-18 in attached logs) and engine 2.Try to put SPM host to Maintenance (this step has to be executed before engine recognizes that network connection to host is not available) Actual results: SPM function is not transferred from Non Responsive host to another host, so whole data center is Non Responsive and user cannot solve this in other way than to restore network connection to the Non Responsive host Expected results: Another host will be selected as SPM Additional info: