Bug 829174
Summary: | [Error handling] [scale] new SPM Selection fails in DC where SPM host with running VMs is Non Responsive | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Rami Vaknin <rvaknin> | ||||
Component: | ovirt-engine | Assignee: | Ayal Baron <abaron> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Haim <hateya> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.1.0 | CC: | abaron, amureini, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, yeylon, ykaul | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.1.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-02-03 12:24:56 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
The only solution is to fence the host. We cannot do it automatically because as far as we know there are still VMs running on that node. We cannot solve this for current version. pushing to future. This requires sanlock, storage monitoring and request to stop spm through storage Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug. |
Created attachment 589750 [details] engine logs Version: RHEVM SI4 Scenario: 1. Data Center with 2 clusters, first cluster with 20 hosts, second cluster with 4 hosts, SPM is host from the first cluster 2. All 20 hosts in the first cluster became Non Responsive due to power issue in their lab rack Results: New SPM selection failed, one hour after the power failure - there is still no SPM, and the old SPM (which is in Non Responsive status) can't even be moved to Maintenance because it has running VMs on it. Expected Results: New SPM will be selected automatically and successfully from one of the Up hosts. From engine.log, with "grep -i spm" due to a lot of network-failures-related logs, puma05 is the old SPM, puma hosts are down, tigris hosts are up: 2012-06-06 09:38:40,654 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-66) SPM selection - vds seems as spm puma05 2012-06-06 09:38:40,655 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-66) spm vds is non responsive, stopping spm selection. 2012-06-06 09:38:50,631 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-90) hostFromVds::selectedVds - tigris01, spmStatus Free, storage pool iscsi_dc 2012-06-06 09:38:50,654 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-90) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20 2012-06-06 09:38:50,657 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-90) SPM selection - vds seems as spm puma05 2012-06-06 09:38:50,658 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-90) spm vds is non responsive, stopping spm selection. 2012-06-06 09:38:50,686 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) hostFromVds::selectedVds - tigris02, spmStatus Free, storage pool iscsi_dc 2012-06-06 09:38:50,710 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20 2012-06-06 09:38:50,714 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) SPM selection - vds seems as spm puma05 2012-06-06 09:38:50,714 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) spm vds is non responsive, stopping spm selection. 2012-06-06 09:39:00,693 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) hostFromVds::selectedVds - tigris04, spmStatus Free, storage pool iscsi_dc 2012-06-06 09:39:00,713 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20 2012-06-06 09:39:00,717 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) SPM selection - vds seems as spm puma05 2012-06-06 09:39:00,717 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) spm vds is non responsive, stopping spm selection. 2012-06-06 09:39:00,749 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) hostFromVds::selectedVds - tigris03, spmStatus Free, storage pool iscsi_dc 2012-06-06 09:39:00,771 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20 2012-06-06 09:39:00,775 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) SPM selection - vds seems as spm puma05 2012-06-06 09:39:00,776 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) spm vds is non responsive, stopping spm selection. 2012-06-06 09:39:10,758 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) hostFromVds::selectedVds - tigris03, spmStatus Free, storage pool iscsi_dc 2012-06-06 09:39:10,782 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20 2012-06-06 09:39:10,786 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) SPM selection - vds seems as spm puma05 2012-06-06 09:39:10,787 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) spm vds is non responsive, stopping spm selection. 2012-06-06 09:39:10,813 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) hostFromVds::selectedVds - tigris02, spmStatus Free, storage pool iscsi_dc 2012-06-06 09:39:10,832 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20 2012-06-06 09:39:10,835 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) SPM selection - vds seems as spm puma05 2012-06-06 09:39:10,836 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) spm vds is non responsive, stopping spm selection. 2012-06-06 09:39:20,818 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-24) [250fc185] hostFromVds::selectedVds - tigris03, spmStatus Free, storage pool iscsi_dc 2012-06-06 09:39:20,841 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-24) [250fc185] SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20 2012-06-06 09:39:20,845 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-24) [250fc185] SPM selection - vds seems as spm puma05 2012-06-06 09:39:20,846 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-24) [250fc185] spm vds is non responsive, stopping spm selection. 2012-06-06 09:39:20,873 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-20) [1262c27b] hostFromVds::selectedVds - tigris01, spmStatus Free, storage pool iscsi_dc 2012-06-06 09:39:20,899 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-20) [1262c27b] SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20 2012-06-06 09:39:20,903 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-20) [1262c27b] SPM selection - vds seems as spm puma05 2012-06-06 09:39:20,904 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-20) [1262c27b] spm vds is non responsive, stopping spm selection. 2012-06-06 09:39:30,891 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-16) [23a528bd] hostFromVds::selectedVds - tigris02, spmStatus Free, storage pool iscsi_dc 2012-06-06 09:39:30,912 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-16) [23a528bd] SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20 2012-06-06 09:39:30,915 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-16) [23a528bd] SPM selection - vds seems as spm puma05 2012-06-06 09:39:30,916 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-16) [23a528bd] spm vds is non responsive, stopping spm selection. 2012-06-06 09:39:30,951 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-55) hostFromVds::selectedVds - tigris04, spmStatus Free, storage pool iscsi_dc 2012-06-06 09:39:30,971 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-55) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20 2012-06-06 09:39:30,974 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-55) SPM selection - vds seems as spm puma05 2012-06-06 09:39:30,975 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-55) spm vds is non responsive, stopping spm selection.