Description of problem: When blocking connection from engine to SPM host, the SPM role cannot start on the remaining active host, and without the SPM all the storage domains turn inactive. Version-Release number of selected component (if applicable): ovirt-engine-4.1.0-0.4.master.20170103091953.gitfaae662.el7.centos.noarch vdsm-4.19.1-17.gitf1272bf.el7.centos.x86_64 How reproducible: 100% Steps to Reproduce: 1. run getAllTasksStatuses to determine which host is spm 2. block connection from the engine to that host Actual results: spm host and all storage domains are inactive Expected results: the active host gets the spm role, storage domains are still active Additional info: engine.log 2017-01-04 18:22:13,808+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler8) [] Correlation ID: null, Call Stack: null, Custom E vent ID: -1, Message: VDSM blond-vdsf command failed: Connection issue java.rmi.ConnectException: Connection timeout 2017-01-04 18:22:13,809+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler8) [] Command 'SpmStatusVDSCommand(HostName = blond-vdsf, SpmStatusVDSCommandParameters:{runAsync='true', hostId='1529693d-d0fd-4dd6-bb76-8d66de7daeea', storagePoolId='00000001-0001-0001-0001-000000000311'})' execution failed: VDSGenericException: VDSNetworkException: Connection issue java.rmi.ConnectException: Connection timeout
Created attachment 1237234 [details] logs engine and vdsm
That's the expected behavior - we can't start the SPM on a different host while there's a host holding the role. In order to free the role the host may be fenced by our automatic fencing, manually rebooted (with later confirming in the engine that the host has been rebooted by right clicking on it -> "confirm host was rebooted") or have it's connectivity the engine restored.