Bug 829174 - [Error handling] [scale] new SPM Selection fails in DC where SPM host with running VMs is Non Responsive
[Error handling] [scale] new SPM Selection fails in DC where SPM host with ru...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
Unspecified Unspecified
unspecified Severity high
: ---
: 3.1.0
Assigned To: Ayal Baron
Haim
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-06 03:19 EDT by Rami Vaknin
Modified: 2016-02-10 12:06 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-03 07:24:56 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
engine logs (247.70 KB, application/x-compressed-tar)
2012-06-06 03:19 EDT, Rami Vaknin
no flags Details

  None (edit)
Description Rami Vaknin 2012-06-06 03:19:33 EDT
Created attachment 589750 [details]
engine logs

Version:
RHEVM SI4

Scenario:
1. Data Center with 2 clusters, first cluster with 20 hosts, second cluster with 4 hosts, SPM is host from the first cluster
2. All 20 hosts in the first cluster became Non Responsive due to power issue in their lab rack

Results:
New SPM selection failed, one hour after the power failure - there is still no SPM, and the old SPM (which is in Non Responsive status) can't even be moved to Maintenance because it has running VMs on it.

Expected Results:
New SPM will be selected automatically and successfully from one of the Up hosts.


From engine.log, with "grep -i spm" due to a lot of network-failures-related logs, puma05 is the old SPM, puma hosts are down, tigris hosts are up:
2012-06-06 09:38:40,654 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-66) SPM selection - vds seems as spm puma05
2012-06-06 09:38:40,655 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-66) spm vds is non responsive, stopping spm selection.
2012-06-06 09:38:50,631 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-90) hostFromVds::selectedVds - tigris01, spmStatus Free, storage pool iscsi_dc
2012-06-06 09:38:50,654 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-90) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20
2012-06-06 09:38:50,657 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-90) SPM selection - vds seems as spm puma05
2012-06-06 09:38:50,658 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-90) spm vds is non responsive, stopping spm selection.
2012-06-06 09:38:50,686 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) hostFromVds::selectedVds - tigris02, spmStatus Free, storage pool iscsi_dc
2012-06-06 09:38:50,710 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20
2012-06-06 09:38:50,714 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) SPM selection - vds seems as spm puma05
2012-06-06 09:38:50,714 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) spm vds is non responsive, stopping spm selection.
2012-06-06 09:39:00,693 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) hostFromVds::selectedVds - tigris04, spmStatus Free, storage pool iscsi_dc
2012-06-06 09:39:00,713 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20
2012-06-06 09:39:00,717 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) SPM selection - vds seems as spm puma05
2012-06-06 09:39:00,717 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) spm vds is non responsive, stopping spm selection.
2012-06-06 09:39:00,749 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) hostFromVds::selectedVds - tigris03, spmStatus Free, storage pool iscsi_dc
2012-06-06 09:39:00,771 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20
2012-06-06 09:39:00,775 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) SPM selection - vds seems as spm puma05
2012-06-06 09:39:00,776 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) spm vds is non responsive, stopping spm selection.
2012-06-06 09:39:10,758 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) hostFromVds::selectedVds - tigris03, spmStatus Free, storage pool iscsi_dc
2012-06-06 09:39:10,782 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20
2012-06-06 09:39:10,786 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) SPM selection - vds seems as spm puma05
2012-06-06 09:39:10,787 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-52) spm vds is non responsive, stopping spm selection.
2012-06-06 09:39:10,813 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) hostFromVds::selectedVds - tigris02, spmStatus Free, storage pool iscsi_dc
2012-06-06 09:39:10,832 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20
2012-06-06 09:39:10,835 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) SPM selection - vds seems as spm puma05
2012-06-06 09:39:10,836 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-54) spm vds is non responsive, stopping spm selection.
2012-06-06 09:39:20,818 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-24) [250fc185] hostFromVds::selectedVds - tigris03, spmStatus Free, storage pool iscsi_dc
2012-06-06 09:39:20,841 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-24) [250fc185] SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20
2012-06-06 09:39:20,845 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-24) [250fc185] SPM selection - vds seems as spm puma05
2012-06-06 09:39:20,846 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-24) [250fc185] spm vds is non responsive, stopping spm selection.
2012-06-06 09:39:20,873 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-20) [1262c27b] hostFromVds::selectedVds - tigris01, spmStatus Free, storage pool iscsi_dc
2012-06-06 09:39:20,899 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-20) [1262c27b] SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20
2012-06-06 09:39:20,903 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-20) [1262c27b] SPM selection - vds seems as spm puma05
2012-06-06 09:39:20,904 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-20) [1262c27b] spm vds is non responsive, stopping spm selection.
2012-06-06 09:39:30,891 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-16) [23a528bd] hostFromVds::selectedVds - tigris02, spmStatus Free, storage pool iscsi_dc
2012-06-06 09:39:30,912 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-16) [23a528bd] SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20
2012-06-06 09:39:30,915 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-16) [23a528bd] SPM selection - vds seems as spm puma05
2012-06-06 09:39:30,916 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-16) [23a528bd] spm vds is non responsive, stopping spm selection.
2012-06-06 09:39:30,951 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-55) hostFromVds::selectedVds - tigris04, spmStatus Free, storage pool iscsi_dc
2012-06-06 09:39:30,971 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-55) SPM Init: could not find reported vds or not up - pool:iscsi_dc vds_spm_id: 20
2012-06-06 09:39:30,974 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-55) SPM selection - vds seems as spm puma05
2012-06-06 09:39:30,975 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-55) spm vds is non responsive, stopping spm selection.
Comment 1 Ayal Baron 2012-08-01 08:45:59 EDT
The only solution is to fence the host.  We cannot do it automatically because as far as we know there are still VMs running on that node.
We cannot solve this for current version. pushing to future.
This requires sanlock, storage monitoring and request to stop spm through storage
Comment 2 Itamar Heim 2013-02-03 07:24:56 EST
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.

Note You need to log in before you can comment on or make changes to this bug.