Description of problem: ======================= I have a 3 node setup with about 6 volumes as below(the name represents vol type, and all volumes are distributed-*) cross3 distrep ecvol ecx rep2 rep3 I mounted one of the volume rep2 using fuse mount on one of the nodes itself say n1 and did a restart of glusterd on the same node. I saw that the volume is getting unmounted [2017-03-21 13:01:43.825447] W [socket.c:593:__socket_rwv] 0-glusterfs: readv on 10.70.35.192:24007 failed (No data available) [2017-03-21 13:01:43.825516] E [glusterfsd-mgmt.c:2102:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.35.192 (No data available) [2017-03-21 13:01:43.825558] I [glusterfsd-mgmt.c:2120:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers [2017-03-21 13:01:43.825867] W [glusterfsd.c:1329:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(rpc_clnt_notify+0xd3) [0x7f3641eb19f3] -->/usr/sbin/glusterfs(+0x10a9f) [0x7f36425e7a9f] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7f36425e0dfb] ) 0-: received signum (1), shutting down [2017-03-21 13:01:43.825922] I [fuse-bridge.c:5802:fini] 0-fuse: Unmounting '/mnt/rep2'. [root@dhcp35-192 glusterfs]# Version-Release number of selected component (if applicable): ============== [root@dhcp35-192 glusterfs]# rpm -qa|grep gluster glusterfs-geo-replication-3.10.0-1.el7.x86_64 glusterfs-libs-3.10.0-1.el7.x86_64 glusterfs-fuse-3.10.0-1.el7.x86_64 glusterfs-server-3.10.0-1.el7.x86_64 python2-glusterfs-api-1.1-1.el7.noarch glusterfs-extra-xlators-3.10.0-1.el7.x86_64 python2-gluster-3.10.0-1.el7.x86_64 glusterfs-3.10.0-1.el7.x86_64 glusterfs-api-3.10.0-1.el7.x86_64 glusterfs-cli-3.10.0-1.el7.x86_64 glusterfs-rdma-3.10.0-1.el7.x86_64 glusterfs-client-xlators-3.10.0-1.el7.x86_64 [root@dhcp35-192 glusterfs]# #wget -e robots=off -A rpm -r -np -nd https://buildlogs.centos.org/centos/7/storage/x86_64/gluster-3.10/ How reproducible: ============ 2/2
I don't know what exactly made it into 3.10.0, but this looks like something that was already fixed. https://review.gluster.org/#/c/16886/ Please verify whether you have that patch in the version you're using.
*** Bug 1434617 has been marked as a duplicate of this bug. ***
That is a very partial fix(?) that prevents the client from closing. The rest of the problem is that the clients should get the complete list of volume member servers and be able to connect to any of them after the initial volfile retrieval. If the volume is changed, like with an add-brick, replace-brick, or remove-brick the list of known servers should also be updated. I suggest the volume members as opposed to the peers because there's no guarantee the client will have network access to all of the peers, nor should that be a requirement. There may be good reason for a peer group to allow access to one volume from one network, but not allow access to a different volume hosted by the same peer group for management purposes.
Just to be clear, are you saying that's a feature that should be added, or a feature that used to exist but has regressed?
IMHO, it's a bug that I've been forgetting to file for years. It's critical because if the mount server fails and is replaced with a new one, the clients will never be connected to a glusterd ever again unless remounted.
Isn't this why we have backup-volfile-server option in place?
In a cloud environment, or on kubernetes, you don't necessarily have control over which nodes are going to die or be replaced. The management connection really needs to be dynamic.
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.