1434412 – Brick Multiplexing: Volume gets unmounted when glusterd is restarted

Bug 1434412 - Brick Multiplexing: Volume gets unmounted when glusterd is restarted

Summary: Brick Multiplexing: Volume gets unmounted when glusterd is restarted

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Jeff Darcy
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1434617 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-03-21 13:07 UTC by Nag Pavan Chilakam
Modified:	2018-06-20 18:26 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-06-20 18:26:00 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2017-03-21 13:07:03 UTC

Description of problem:
=======================
I have a 3 node setup with about 6 volumes as below(the name represents vol type, and all volumes are distributed-*)
cross3
distrep
ecvol
ecx
rep2
rep3

I mounted one of the volume rep2 using fuse mount on one of the nodes itself say n1
and did a restart of glusterd on the same node. I saw that the volume is getting unmounted

[2017-03-21 13:01:43.825447] W [socket.c:593:__socket_rwv] 0-glusterfs: readv on 10.70.35.192:24007 failed (No data available)
[2017-03-21 13:01:43.825516] E [glusterfsd-mgmt.c:2102:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.35.192 (No data available)
[2017-03-21 13:01:43.825558] I [glusterfsd-mgmt.c:2120:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2017-03-21 13:01:43.825867] W [glusterfsd.c:1329:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(rpc_clnt_notify+0xd3) [0x7f3641eb19f3] -->/usr/sbin/glusterfs(+0x10a9f) [0x7f36425e7a9f] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7f36425e0dfb] ) 0-: received signum (1), shutting down
[2017-03-21 13:01:43.825922] I [fuse-bridge.c:5802:fini] 0-fuse: Unmounting '/mnt/rep2'.
[root@dhcp35-192 glusterfs]# 








Version-Release number of selected component (if applicable):
==============
[root@dhcp35-192 glusterfs]# rpm -qa|grep gluster
glusterfs-geo-replication-3.10.0-1.el7.x86_64
glusterfs-libs-3.10.0-1.el7.x86_64
glusterfs-fuse-3.10.0-1.el7.x86_64
glusterfs-server-3.10.0-1.el7.x86_64
python2-glusterfs-api-1.1-1.el7.noarch
glusterfs-extra-xlators-3.10.0-1.el7.x86_64
python2-gluster-3.10.0-1.el7.x86_64
glusterfs-3.10.0-1.el7.x86_64
glusterfs-api-3.10.0-1.el7.x86_64
glusterfs-cli-3.10.0-1.el7.x86_64
glusterfs-rdma-3.10.0-1.el7.x86_64
glusterfs-client-xlators-3.10.0-1.el7.x86_64
[root@dhcp35-192 glusterfs]# #wget -e robots=off -A rpm -r -np -nd   https://buildlogs.centos.org/centos/7/storage/x86_64/gluster-3.10/



How reproducible:
============
2/2

Comment 1 Jeff Darcy 2017-03-21 15:13:57 UTC

I don't know what exactly made it into 3.10.0, but this looks like something that was already fixed.

https://review.gluster.org/#/c/16886/

Please verify whether you have that patch in the version you're using.

Comment 2 Joe Julian 2017-03-22 00:50:14 UTC

*** Bug 1434617 has been marked as a duplicate of this bug. ***

Comment 3 Joe Julian 2017-03-22 01:02:19 UTC

That is a very partial fix(?) that prevents the client from closing.

The rest of the problem is that the clients should get the complete list of volume member servers and be able to connect to any of them after the initial volfile retrieval. If the volume is changed, like with an add-brick, replace-brick, or remove-brick the list of known servers should also be updated.

I suggest the volume members as opposed to the peers because there's no guarantee the client will have network access to all of the peers, nor should that be a requirement. There may be good reason for a peer group to allow access to one volume from one network, but not allow access to a different volume hosted by the same peer group for management purposes.

Comment 4 Jeff Darcy 2017-03-22 01:07:42 UTC

Just to be clear, are you saying that's a feature that should be added, or a feature that used to exist but has regressed?

Comment 5 Joe Julian 2017-03-22 01:26:04 UTC

IMHO, it's a bug that I've been forgetting to file for years. It's critical because if the mount server fails and is replaced with a new one, the clients will never be connected to a glusterd ever again unless remounted.

Comment 6 Atin Mukherjee 2017-03-22 01:34:16 UTC

Isn't this why we have backup-volfile-server option in place?

Comment 7 Joe Julian 2017-03-22 03:52:13 UTC

In a cloud environment, or on kubernetes, you don't necessarily have control over which nodes are going to die or be replaced. The management connection really needs to be dynamic.

Comment 8 Shyamsundar 2018-06-20 18:26:00 UTC

This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.

Note You need to log in before you can comment on or make changes to this bug.