Bug 846619 - Client doesn't reconnect after server comes back online
Client doesn't reconnect after server comes back online
Product: GlusterFS
Classification: Community
Component: unclassified (Show other bugs)
x86_64 Linux
high Severity urgent
: ---
: ---
Assigned To: Vijay Bellur
Depends On:
  Show dependency treegraph
Reported: 2012-08-08 05:05 EDT by Martin Erlingsson
Modified: 2014-12-14 14:40 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2014-12-14 14:40:29 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
extracts from log files, cli output, statedumps (922.16 KB, text/plain)
2012-08-08 05:05 EDT, Martin Erlingsson
no flags Details

  None (edit)
Description Martin Erlingsson 2012-08-08 05:05:10 EDT
Created attachment 602973 [details]
extracts from log files, cli output, statedumps

Description of problem:
Client doesn't reconnect after server comes back online (but logs and volume status say it does). Requires manual remount.
We have one client and two servers. They are all running Debian Squeeze and we compiled glusterfs-3.3.0 from source on each one of them.

Version-Release number of selected component (if applicable):

How reproducible:
100% reproducible in our lab setup

Steps to Reproduce:
1. Run ls -lR in mounted gluster folder on client to start directory listing
2. Pull network cable from server2 for one minute. Directory listing waits for gluster timeout and resumes with only one server.
3. Reconnect network cable to server2. 
4. Create a new file in mounted gluster folder on client. Run ls -lR in mounted gluster folder on client and in the exported folder on the servers.
Actual results:
The new file only shows up on server1. glusterfsd process uses ~10% CPU time on server1 but ~0% on server2. Redundancy lost. 

Expected results:
Client should properly connect to server2 when it comes back online. The new file should show up on both servers. glusterfsd process should be using cpu time on both servers. Redundancy should be re-aquired.

Additional info:
Despite not connecting properly, the client log states that the client connects to the reconnected server: "Connected to, attached to remote volume '/data/exp'.". Both servers show all bricks connected when running gluster volume status or gluster volume info.
Same problem occurs when I unplug server2 during a dbench on gluster mount. Same problem occurs whichever of server1 and server2 I unplug.
When the client/servers are in this state I can manually unmount and mount the filesystem again (umount /storage/asd and then mount -a) to get a proper connection to both servers. If I instead unplug the remaining server from the network, I get "Transport endpoint is not connected" when trying to use the mounted gluster folder.
The client and the servers are virtual machines in VMWare vSphere. I unplug the network by de-selecting "Connected" on the settings for Network for the virtual machine.

Attaching file: 
gluster-connect-bug-2012-08-08.txt with extracts from log files, gluster commands output, and volume statedumps from the servers.
Comment 1 Niels de Vos 2014-11-27 09:53:46 EST
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.

Note You need to log in before you can comment on or make changes to this bug.