Description of problem: - 2 Node Cluster - When the SPM Node fails, the Datacenter goes down and no new SPM will be elected - When fencing takes place and was successfull, the Message is: Manual fence did not revoke the selected SPM (Node1) since the master storage domain was not active or could not use another host for the fence operation. - Manual Fencing the Host makes no difference - I found no Solution to switch the SPM to a working Node, which means the Datacenter is unusable until the original SPM is back online Version-Release number of selected component (if applicable): - 3.4.0 How reproducible: Steps to Reproduce: 1.Power off SPM Node 2. Prevent the Host from coming up again after fencing took place Actual results: - Datacenter is down, SPM is Non Responsive - When trying to manual switch the SPM, the Message is: Error while executing action: Cannot force select SPM: Storage Domain cannot be accessed. -Please check that at least one Host is operational and Data Center state is up. Expected results: - A new SPM Node is elected
2014-03-30 22:07:43,323 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-22) hostFromVds::selectedVds - Node2, spmStatus Free, storage pool Default 2014-03-30 22:07:43,325 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-22) SPM Init: could not find reported vds or not up - pool:Default vds_spm_id: 1 2014-03-30 22:07:43,350 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-22) SPM selection - vds seems as spm Node1 2014-03-30 22:07:43,351 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-22) spm vds is non responsive, stopping spm selection. 2014-03-30 22:07:43,714 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-78) Command GetCapabilitiesVDSCommand(HostName = Node1, HostId = ff474b41-22c5-440e-8052-4cf40c27b250, vds=Host[Node1]) execution failed. Exception: VDSNetworkException: java.net.SocketTimeoutException: connect timed out 2014-03-30 22:07:48,848 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-26) Command GetCapabilitiesVDSCommand(HostName = Node1, HostId = ff474b41-22c5-440e-8052-4cf40c27b250, vds=Host[Node1]) execution failed. Exception: VDSNetworkException: java.net.SocketTimeoutException: connect timed out 2014-03-30 22:07:53,396 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-71) [4b275d81] hostFromVds::selectedVds - Node2, spmStatus Free, storage pool Default 2014-03-30 22:07:53,398 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-71) [4b275d81] SPM Init: could not find reported vds or not up - pool:Default vds_spm_id: 1 2014-03-30 22:07:53,424 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-71) [4b275d81] SPM selection - vds seems as spm Node1 2014-03-30 22:07:53,425 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-71) [4b275d81] spm vds is non responsive, stopping spm selection. 2014-03-30 22:07:53,973 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-37) Command GetCapabilitiesVDSCommand(HostName = Node1, HostId = ff474b41-22c5-440e-8052-4cf40c27b250, vds=Host[Node1]) execution failed. Exception: VDSNetworkException: java.net.SocketTimeoutException: connect timed out 2014-03-30 22:07:59,072 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-49) Command GetCapabilitiesVDSCommand(HostName = Node1, HostId = ff474b41-22c5-440e-8052-4cf40c27b250, vds=Host[Node1]) execution failed. Exception: VDSNetworkException: java.net.SocketTimeoutException: connect timed out 2014-03-30 22:08:03,472 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-35) hostFromVds::selectedVds - Node2, spmStatus Free, storage pool Default 2014-03-30 22:08:03,477 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-35) SPM Init: could not find reported vds or not up - pool:Default vds_spm_id: 1 2014-03-30 22:08:03,502 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-35) SPM selection - vds seems as spm Node1 2014-03-30 22:08:03,503 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-35) spm vds is non responsive, stopping spm selection.
Liron, please take a look at this?
Thanks for reporting this issue! Could you please attach the full engine and vdsm logs?
Sure, waiting for the logs as the current paste is not enough for performing RCA of the issue.
Thanks for replying, I will upload the logs tomorrow as I have no access to the System today.
Created attachment 881647 [details] log when Node1 was powerded off
Created attachment 881648 [details] log when Node1 was powerded off
Created attachment 881649 [details] log when Node1 is back again
Created attachment 881650 [details] log when Node1 is back again
Created attachment 881651 [details] log when Node1 is back again
Hi, the issue here is an attributeError, caused by executing a method on the wrong objects. Thread-43::INFO::2014-04-02 08:57:34,510::logUtils::44::dispatcher::(wrapper) Run and protect: fenceSpmStorage(spUUID='00000002-0002-0002-0002-0000000000ea', lastOwner=None, lastLver=None, options=None) Thread-43::ERROR::2014-04-02 08:57:34,511::task::866::TaskManager.Task::(_setError) Task=`6f4cb2fc-816c-41f1-bcf5-272396b40b40`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3546, in fenceSpmStorage pool.invalidateMetadata() AttributeError: 'StoragePool' object has no attribute 'invalidateMetadata' will add a fix soon
This is an automated message oVirt 3.4.1 has been released: * should fix your issue * should be available at your local mirror within two days. If problems still persist, please make note of it in this bug report.
*** Bug 1103165 has been marked as a duplicate of this bug. ***