Bug 846619 - Client doesn't reconnect after server comes back online
Summary: Client doesn't reconnect after server comes back online
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: GlusterFS
Classification: Community
Component: unclassified
Version: 3.3.0
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
Assignee: Vijay Bellur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-08 09:05 UTC by Martin Erlingsson
Modified: 2014-12-14 19:40 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-14 19:40:29 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
extracts from log files, cli output, statedumps (922.16 KB, text/plain)
2012-08-08 09:05 UTC, Martin Erlingsson
no flags Details

Description Martin Erlingsson 2012-08-08 09:05:10 UTC
Created attachment 602973 [details]
extracts from log files, cli output, statedumps

Description of problem:
Client doesn't reconnect after server comes back online (but logs and volume status say it does). Requires manual remount.
We have one client and two servers. They are all running Debian Squeeze and we compiled glusterfs-3.3.0 from source on each one of them.

Version-Release number of selected component (if applicable):
Glusterfs-3.3.0

How reproducible:
100% reproducible in our lab setup

Steps to Reproduce:
1. Run ls -lR in mounted gluster folder on client to start directory listing
2. Pull network cable from server2 for one minute. Directory listing waits for gluster timeout and resumes with only one server.
3. Reconnect network cable to server2. 
4. Create a new file in mounted gluster folder on client. Run ls -lR in mounted gluster folder on client and in the exported folder on the servers.
  
Actual results:
The new file only shows up on server1. glusterfsd process uses ~10% CPU time on server1 but ~0% on server2. Redundancy lost. 

Expected results:
Client should properly connect to server2 when it comes back online. The new file should show up on both servers. glusterfsd process should be using cpu time on both servers. Redundancy should be re-aquired.

Additional info:
Despite not connecting properly, the client log states that the client connects to the reconnected server: "Connected to 10.128.196.182:24010, attached to remote volume '/data/exp'.". Both servers show all bricks connected when running gluster volume status or gluster volume info.
Same problem occurs when I unplug server2 during a dbench on gluster mount. Same problem occurs whichever of server1 and server2 I unplug.
When the client/servers are in this state I can manually unmount and mount the filesystem again (umount /storage/asd and then mount -a) to get a proper connection to both servers. If I instead unplug the remaining server from the network, I get "Transport endpoint is not connected" when trying to use the mounted gluster folder.
The client and the servers are virtual machines in VMWare vSphere. I unplug the network by de-selecting "Connected" on the settings for Network for the virtual machine.

Attaching file: 
gluster-connect-bug-2012-08-08.txt with extracts from log files, gluster commands output, and volume statedumps from the servers.

Comment 1 Niels de Vos 2014-11-27 14:53:46 UTC
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.


Note You need to log in before you can comment on or make changes to this bug.