Created attachment 762001 [details] logs Description of problem: I tried activating a storage domain which is inaccessible and after the failure to activate the domain engine sends spmStop even though its not a master domain and its inaccessible from all hosts. Version-Release number of selected component (if applicable): sf18 vdsm-4.10.2-23.0.el6ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. create two iscsi storage domain located on two different storage servers 2. put the non-master domain in maintenance 3. from all hosts block connectivity to the non-master storage domain 4. activate the non-master storage domain Actual results: engine sends SpmStop even though all hosts cannot see the storage domain Expected results: we should not send SpmStop Additional info: logs 2013-06-17 13:52:42,074 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-46) [2ec39c6d] Error code StorageDomainDoesNotExist and error message IRSGenericException: IRSErrorException: Failed to ActivateStorageDomainVDS, error = Storage domain does not exist: ('38755249-4bb3-4841-bf5b-05f4a521514d',) 2013-06-17 13:52:42,119 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (pool-4-thread-46) [2ec39c6d] SpmStopVDSCommand::Stopping SPM on vds cougar01, pool id 7fd33b43-a9f4-4eb7-a885-e9583a929ceb 2013-06-17 13:52:43,165 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (pool-4-thread-46) [2ec39c6d] FINISH, SpmStopVDSCommand, log id: 465bea95 2013-06-17 13:52:43,165 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-46) [2ec39c6d] Irs placed on server 4497d431-7c5e-4924-96e0-3f9cdbf826 e5 failed. Proceed Failover
As the domain is in MAINTENANCE, no monitoring should be done for that domain by the hosts/engine - therefore, we don't know beforehand if it's seen at all by any of the hosts prior to the activation execution - the question whether we don't to perform failover in this case is debatable IMO. we might "get some idea" about the activation result by performing different checks before running the activate vds command to improve the chances of predicting the result - it seems to me like an RFE which might be partially contained by other upcoming features. Allon, what's your take on it?
In general I agree that this is not very nice but the scenario is that user is activating a domain which she believes is now ok and activation fails. In this case it is reasonable to assume that the problem is specific to the host and not to the domain. As Liron mentioned, we do not monitor domains in maintenance mode and solving this use case requires a lot of code for very little gain.