Bug 858191 - if some operations in server_setvolume fails, then do put the connection back by removing it from the list
Summary: if some operations in server_setvolume fails, then do put the connection back...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: protocol
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: ---
Assignee: Raghavendra Bhat
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-18 09:49 UTC by Raghavendra Bhat
Modified: 2013-07-24 17:42 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.4.0
Clone Of:
Environment:
Last Closed: 2013-07-24 17:42:09 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Bhat 2012-09-18 09:49:26 UTC
Description of problem:

Suppose there is a cluster of 4 (say A,B,S,D) peers which with a volume which has brick processes running in all the peers. Now kill all the gluster processes running in first 3 machines (A,B,C), remove /var/lib/glusterd and freshly create a new cluster among A,B,C. Create a volume and start it.

Now in machine D there will be processes (nfs, glustershd, mounted clients) which will be trying to connect to the brick processes of other 3 machines. If one of the bricks (or more) from those 3 machines is running  in the same port as the clients from D are trying to connect to, then in server_setvolume we try to establish the connection with it. We get the connection object (get it from the list of connections if present or create a new connection object and add to the list). But the bound_xl that the clients from D are trying to connect to will be different, hence the connection wont happen. But the connection abject that has been created will not be removed from the list of connections and it will remain there forever. 

When statedump is issued where all the connection objects present in the list are traversed and dumped, the bound_xl of the connection which was created during failed setvolume will be NULL and thus the process might segfault trying to access it.

To fix it if something fails after getting the connection in server_setvolume, then do server_connection_put on that connection object.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Raghavendra Bhat 2012-09-28 12:48:35 UTC
http://review.gluster.org/#change,3953 fixes the issue.


Note You need to log in before you can comment on or make changes to this bug.