Bug 1434617

Summary: mounts fail to remain connected if the mount server is brought down
Product: [Community] GlusterFS Reporter: Joe Julian <joe>
Component: coreAssignee: bugs <bugs>
Status: CLOSED DUPLICATE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.10CC: bugs, jeff, joe
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-22 00:50:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joe Julian 2017-03-21 23:37:26 UTC
Description of problem:
If a volume is fuse mounted from a specific server hostname and that host is taken down, the mounted volume will fail.

Version-Release number of selected component (if applicable):
3.10.0

How reproducible:
always

Steps to Reproduce:
1. create a replica 3 volume
2. on a client (not server1): mount -t glusterfs server1:myvol /mnt
3. ls /mnt # works correctly
4. on server 1: pkill -f gluster
5. back on the client, ls /mnt # no files

Actual results:
The mount fails because of an error: 
[2017-03-21 22:29:00.467883] E [glusterfsd-mgmt.c:2102:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: gluster-test01 (No data available)
[2017-03-21 22:29:00.467907] I [glusterfsd-mgmt.c:2120:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers

Expected results:
The mount would remain connected to the other two servers and remain operational.

Additional info:
As a workaround, if you use rrdns or specify backupvolfile-server(s) the clients will continue to function.

The clients should get the complete list of volume member servers and be able to connect to any of them after the initial volfile retrieval. If the volume is changed, like with an add-brick, replace-brick, or remove-brick the list of known servers should also be updated.

Comment 1 Jeff Darcy 2017-03-22 00:44:50 UTC
Possible dup of https://bugzilla.redhat.com/show_bug.cgi?id=1434412 (which itself was a dup)?

Comment 2 Joe Julian 2017-03-22 00:50:14 UTC

*** This bug has been marked as a duplicate of bug 1434412 ***