Red Hat Bugzilla – Bug 975003
engine: failure to activate a non-master storage domain which is inaccessible will trigger spmStop
Last modified: 2016-02-10 12:14:20 EST
Created attachment 762001 [details]
Description of problem:
I tried activating a storage domain which is inaccessible and after the failure to activate the domain engine sends spmStop even though its not a master domain and its inaccessible from all hosts.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. create two iscsi storage domain located on two different storage servers
2. put the non-master domain in maintenance
3. from all hosts block connectivity to the non-master storage domain
4. activate the non-master storage domain
engine sends SpmStop even though all hosts cannot see the storage domain
we should not send SpmStop
Additional info: logs
2013-06-17 13:52:42,074 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-46) [2ec39c6d] Error code StorageDomainDoesNotExist and error message
IRSGenericException: IRSErrorException: Failed to ActivateStorageDomainVDS, error = Storage domain does not exist: ('38755249-4bb3-4841-bf5b-05f4a521514d',)
2013-06-17 13:52:42,119 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (pool-4-thread-46) [2ec39c6d] SpmStopVDSCommand::Stopping SPM on vds cougar01, pool
2013-06-17 13:52:43,165 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (pool-4-thread-46) [2ec39c6d] FINISH, SpmStopVDSCommand, log id: 465bea95
2013-06-17 13:52:43,165 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-46) [2ec39c6d] Irs placed on server 4497d431-7c5e-4924-96e0-3f9cdbf826
e5 failed. Proceed Failover
As the domain is in MAINTENANCE, no monitoring should be done for that domain by the hosts/engine - therefore, we don't know beforehand if it's seen at all by any of the hosts prior to the activation execution - the question whether we don't to perform failover in this case is debatable IMO.
we might "get some idea" about the activation result by performing different checks before running the activate vds command to improve the chances of predicting the result - it seems to me like an RFE which might be partially contained by other upcoming features.
Allon, what's your take on it?
In general I agree that this is not very nice but the scenario is that user is activating a domain which she believes is now ok and activation fails. In this case it is reasonable to assume that the problem is specific to the host and not to the domain. As Liron mentioned, we do not monitor domains in maintenance mode and solving this use case requires a lot of code for very little gain.