Bug 858191 - if some operations in server_setvolume fails, then do put the connection back by removing it from the list
if some operations in server_setvolume fails, then do put the connection back...
Product: GlusterFS
Classification: Community
Component: protocol (Show other bugs)
Unspecified Unspecified
medium Severity unspecified
: ---
: ---
Assigned To: Raghavendra Bhat
Depends On:
  Show dependency treegraph
Reported: 2012-09-18 05:49 EDT by Raghavendra Bhat
Modified: 2013-07-24 13:42 EDT (History)
1 user (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-07-24 13:42:09 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Raghavendra Bhat 2012-09-18 05:49:26 EDT
Description of problem:

Suppose there is a cluster of 4 (say A,B,S,D) peers which with a volume which has brick processes running in all the peers. Now kill all the gluster processes running in first 3 machines (A,B,C), remove /var/lib/glusterd and freshly create a new cluster among A,B,C. Create a volume and start it.

Now in machine D there will be processes (nfs, glustershd, mounted clients) which will be trying to connect to the brick processes of other 3 machines. If one of the bricks (or more) from those 3 machines is running  in the same port as the clients from D are trying to connect to, then in server_setvolume we try to establish the connection with it. We get the connection object (get it from the list of connections if present or create a new connection object and add to the list). But the bound_xl that the clients from D are trying to connect to will be different, hence the connection wont happen. But the connection abject that has been created will not be removed from the list of connections and it will remain there forever. 

When statedump is issued where all the connection objects present in the list are traversed and dumped, the bound_xl of the connection which was created during failed setvolume will be NULL and thus the process might segfault trying to access it.

To fix it if something fails after getting the connection in server_setvolume, then do server_connection_put on that connection object.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:

Expected results:

Additional info:
Comment 1 Raghavendra Bhat 2012-09-28 08:48:35 EDT
http://review.gluster.org/#change,3953 fixes the issue.

Note You need to log in before you can comment on or make changes to this bug.