Description of problem: In a replica 3 gluster volume deployment with hosts (h1, h2, h3), the storage domain was mounted with h1:vol with backup-volfile-servers as h2:h3. However, when h1 server is down, the validation of gluster volume fails (as it uses primary server to fetch vol file) and the mount does not succeed though backup volfile servers were online. jsonrpc.Executor/3::ERROR::2016-01-29 17:45:10,180::hsm::2473::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2470, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 221, in connect self.validate() File "/usr/share/vdsm/storage/storageServer.py", line 346, in validate replicaCount = self.volinfo['replicaCount'] File "/usr/share/vdsm/storage/storageServer.py", line 333, in volinfo self._volinfo = self._get_gluster_volinfo() File "/usr/share/vdsm/storage/storageServer.py", line 371, in _get_gluster_volinfo return volinfo[self._volname] KeyError: u'engine' Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Create a storage domain using a gluster replica 3 volume, and provide backup-volfile-servers option. for instance, h1:/vol1 and in mount options, backup-volfile-servers=h2:h3 2. Bring one of the hypervisor nodes down, and the h1 server down. 3. Activate the hypervisor node which will try to mount the gluster storage domain - resulting in error. Expected results: Storage domain should be accessible using the backup-volfile-servers. Additional info:
Related, but a different scenario - if glusterd is not running on the primary server used for mount, following error is thrown jsonrpc.Executor/7::ERROR::2016-02-04 15:12:33,219::hsm::2473::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2470, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 221, in connect self.validate() File "/usr/share/vdsm/storage/storageServer.py", line 346, in validate replicaCount = self.volinfo['replicaCount'] File "/usr/share/vdsm/storage/storageServer.py", line 333, in volinfo self._volinfo = self._get_gluster_volinfo() File "/usr/share/vdsm/storage/storageServer.py", line 370, in _get_gluster_volinfo self._volfileserver) File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> **kwargs) File "<string>", line 2, in glusterVolumeInfo File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result) GlusterCmdExecFailedException: Command execution failed error: Connection failed. Please check if gluster daemon is operational. return code: 1
We simply fuse mount the Gluster servers, if this fails we fail to connect. If we should not fail on this Gluster needs to be able to fuse mount even when primary is down.
The failure occurs because we get vol info from primary gluster server before doing the mount. If the the server is down, we fail before getting to do the mount.
Removing needinfo as Ala has answered question. It's not a gluster mount issue, but a pre-validation issue as mentioned in Comment 3
The patch needs to be backported to the 3.6 branch - returning status to POST.
Bugs moved pre-mature to ON_QA since they didn't have target release. Notice that only bugs with a set target release will move to ON_QA.
Gluster domain remains accessible after the scenario described in the description: 1. Create a storage domain using a gluster replica 3 volume, and provide backup-volfile-servers option. for instance, h1:/vol1 and in mount options, backup-volfile-servers=h2:h3 2. Bring one of the hypervisor nodes down, and the h1 server down. 3. Activate the hypervisor node Both hosts have access to the domain even though the primary Gluster server is down. Verified using: rhevm-3.6.5-0.1.el6.noarch vdsm-4.17.25-0.el7ev.noarch glusterfs-devel-3.7.8-4.el7.x86_64 glusterfs-rdma-3.7.8-4.el7.x86_64 glusterfs-fuse-3.7.8-4.el7.x86_64 glusterfs-server-3.7.8-4.el7.x86_64 python-gluster-3.7.8-4.el7.noarch glusterfs-ganesha-3.7.8-4.el7.x86_64 glusterfs-debuginfo-3.7.8-4.el7.x86_64 glusterfs-client-xlators-3.7.8-4.el7.x86_64 glusterfs-extra-xlators-3.7.8-4.el7.x86_64 glusterfs-geo-replication-3.7.8-4.el7.x86_64 glusterfs-libs-3.7.8-4.el7.x86_64 glusterfs-3.7.8-4.el7.x86_64 nfs-ganesha-gluster-2.3.0-1.el7.x86_64 glusterfs-resource-agents-3.7.8-4.el7.noarch glusterfs-cli-3.7.8-4.el7.x86_64 glusterfs-api-devel-3.7.8-4.el7.x86_64 glusterfs-api-3.7.8-4.el7.x86_64