Bug 1001637 - [engine-backend] engine sends ActiveStorageDomain to vdsm even though ConnectStorageServer failed on host
[engine-backend] engine sends ActiveStorageDomain to vdsm even though Connect...
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
x86_64 Unspecified
unspecified Severity low
: ---
: 3.4.0
Assigned To: Liron Aravot
Aharon Canan
: Triaged
Depends On:
  Show dependency treegraph
Reported: 2013-08-27 08:41 EDT by Elad
Modified: 2016-02-10 12:03 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-09-15 11:54:21 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
logs (693.32 KB, application/x-gzip)
2013-08-27 08:41 EDT, Elad
no flags Details

  None (edit)
Description Elad 2013-08-27 08:41:45 EDT
Created attachment 790952 [details]

Description of problem:
When storage server is inaccessible, and vdsm fails to perform connectStorageServer, engine proceed with storage domain activation flow and sends ActiveStorageDomain to vdsm. In case that master domain is active, ActiveStorageDomain succeeds and the inaccessible domain reported as active (false positive).

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
On a file pool with more than 1 SDs from different storage servers:
1) maintenance the non-master domain
2) block connectivity to the non-master storage server (which is in maintenance) from all hosts in cluster
3) activate the domain

Actual results:
ConnectStorageServer fails on vdsm:

Thread-1316::ERROR::2013-08-27 14:36:17,248::storageServer::209::StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: Operation not permitted\n')
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect
    self._mount.mount(self.options, self._vfsType)
  File "/usr/share/vdsm/storage/mount.py", line 222, in mount
    return self._runcmd(cmd, timeout)
  File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd
    raise MountError(rc, ";".join((out, err)))
MountError: (32, ';mount.nfs: Operation not permitted\n')
Thread-1316::ERROR::2013-08-27 14:36:17,250::hsm::2367::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2364, in connectStorageServer
  File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect
    raise e
MountError: (32, ';mount.nfs: Operation not permitted\n')

End on engine:

2013-08-27 14:35:10,102 ERROR [org.ovirt.engine.core.bll.storage.POSIXFSStorageHelper] (pool-5-thread-50) The connection with details lion.qa.lab:/export/elad/elad5 failed because of error code 477 and error message is: problem while trying to mount target
2013-08-27 14:35:10,105 ERROR [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-5-thread-50) Transaction rolled-back for command: org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand.

Even though ConnectStorageServer failed, engine proceed with ActivateStorageDomain:

2013-08-27 14:35:25,196 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ActivateStorageDomainVDSCommand] (pool-5-thread-50) [6404f91d] START, ActivateStorageDomainVDSCommand( storagePoolId = 7a93c0d1-1316-40e2-b946-3180c3415007, ignoreFailoverLimit = false, storageDomainId = 66ae8355-db6a-4b17-a0a5-71d462946344), log id: 4b5ace97

The activation ends successfully and the domain is reported as 'Active'. This happens because the master domain is active.

Expected results:
Engine should fail the flow and not send ActivateStorageDomain to host

Additional info:
Comment 1 Elad 2013-08-27 19:41:35 EDT
***End on engine = And on engine***
Comment 2 Ayal Baron 2013-09-15 11:54:21 EDT
Engine ignores connectStorageServer in most (all?) cases since in many cases the following op can succeed and it's not worth it to try and identify ahead of time which would and which wouldn't.
Also, once we get rid of the pool there will be no 'activate' operation so this is doubly not interesting.

Note You need to log in before you can comment on or make changes to this bug.