Created attachment 790952 [details] logs Description of problem: When storage server is inaccessible, and vdsm fails to perform connectStorageServer, engine proceed with storage domain activation flow and sends ActiveStorageDomain to vdsm. In case that master domain is active, ActiveStorageDomain succeeds and the inaccessible domain reported as active (false positive). Version-Release number of selected component (if applicable): rhevm-3.3.0-0.16.master.el6ev.noarch vdsm-4.12.0-72.git287bb7e.el6ev.x86_64 How reproducible: 100% Steps to Reproduce: On a file pool with more than 1 SDs from different storage servers: 1) maintenance the non-master domain 2) block connectivity to the non-master storage server (which is in maintenance) from all hosts in cluster 3) activate the domain Actual results: ConnectStorageServer fails on vdsm: Thread-1316::ERROR::2013-08-27 14:36:17,248::storageServer::209::StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: Operation not permitted\n') Traceback (most recent call last): File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 222, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';mount.nfs: Operation not permitted\n') Thread-1316::ERROR::2013-08-27 14:36:17,250::hsm::2367::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2364, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect raise e MountError: (32, ';mount.nfs: Operation not permitted\n') End on engine: 2013-08-27 14:35:10,102 ERROR [org.ovirt.engine.core.bll.storage.POSIXFSStorageHelper] (pool-5-thread-50) The connection with details lion.qa.lab:/export/elad/elad5 failed because of error code 477 and error message is: problem while trying to mount target 2013-08-27 14:35:10,105 ERROR [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-5-thread-50) Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand. Even though ConnectStorageServer failed, engine proceed with ActivateStorageDomain: 2013-08-27 14:35:25,196 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ActivateStorageDomainVDSCommand] (pool-5-thread-50) [6404f91d] START, ActivateStorageDomainVDSCommand( storagePoolId = 7a93c0d1-1316-40e2-b946-3180c3415007, ignoreFailoverLimit = false, storageDomainId = 66ae8355-db6a-4b17-a0a5-71d462946344), log id: 4b5ace97 The activation ends successfully and the domain is reported as 'Active'. This happens because the master domain is active. Expected results: Engine should fail the flow and not send ActivateStorageDomain to host Additional info: logs
***End on engine = And on engine***
Engine ignores connectStorageServer in most (all?) cases since in many cases the following op can succeed and it's not worth it to try and identify ahead of time which would and which wouldn't. Also, once we get rid of the pool there will be no 'activate' operation so this is doubly not interesting.