Red Hat Bugzilla – Bug 858191
if some operations in server_setvolume fails, then do put the connection back by removing it from the list
Last modified: 2013-07-24 13:42:09 EDT
Description of problem:
Suppose there is a cluster of 4 (say A,B,S,D) peers which with a volume which has brick processes running in all the peers. Now kill all the gluster processes running in first 3 machines (A,B,C), remove /var/lib/glusterd and freshly create a new cluster among A,B,C. Create a volume and start it.
Now in machine D there will be processes (nfs, glustershd, mounted clients) which will be trying to connect to the brick processes of other 3 machines. If one of the bricks (or more) from those 3 machines is running in the same port as the clients from D are trying to connect to, then in server_setvolume we try to establish the connection with it. We get the connection object (get it from the list of connections if present or create a new connection object and add to the list). But the bound_xl that the clients from D are trying to connect to will be different, hence the connection wont happen. But the connection abject that has been created will not be removed from the list of connections and it will remain there forever.
When statedump is issued where all the connection objects present in the list are traversed and dumped, the bound_xl of the connection which was created during failed setvolume will be NULL and thus the process might segfault trying to access it.
To fix it if something fails after getting the connection in server_setvolume, then do server_connection_put on that connection object.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
http://review.gluster.org/#change,3953 fixes the issue.