Description of problem: After a brick daemon dies, glusterd lost track of new/future brick listen ports. Two different error scenarios can happen: a) A replacement brick from the same node where a brick daemon previously died will not be healed. b) A new volume created using a brick from same server where a brick daemon previously died will not be replicated (by the client) Version-Release number of selected component (if applicable): 3.7.4 How reproducible: Every time. Steps to Reproduce (both scenario a+b): 1a. Create a distributed-replicated 1x2 volume 2a. kill -9 <brick-pid> 3a. stop + delete volume 4a. replace-brick with another brick on same node where <brick-pid> died (healing works) 5a. kill -9 <replacement-brick-pid> 6a. replace-brick with yet another brick (healing fails because wrong pid is used to connect to new brick) 7a. grep "Connection refused" /var/log/glusterfs/glustershd.log 1b. Create a distributed-replicated 1x2 volume 2b. kill -9 <brick-pid> 3b. stop + delete volume 4b. Create new 1x2 volume using same (cleaned) bricks as in 1b 5b. mount it. 6b. On client, grep "Connection refused" /var/log/glusterfs/<volname>.log Actual results: a. # grep "Connection refused" /var/log/glusterfs/glustershd.log [2015-09-18 00:55:24.717023] E [socket.c:2278:socket_connect_finish] 0-voltest-client-0: connection to 192.168.1.3:49152 failed (Connection refused) b. # grep "Connection refused" /var/log/glusterfs/voltest.log [2015-09-18 00:44:59.117344] E [socket.c:2278:socket_connect_finish] 4-voltest-client-0: connection to 192.168.1.3:49152 failed (Connection refused) Expected results: Additional info: Restarting glusterd after the brick daemon is killed will prevent the "Connection refused" in both a) and b)
Request AFR team to check this.
Scenario b is reproducible. We will keep you posted once we have the RCA. Thanks for filing the bug.
REVIEW: http://review.gluster.org/12189 (glusterd: Use GF_PMAP_PORT_BRICKSERVER in pmap_registry_remove from brick disconnects) posted (#1) for review on master by Atin Mukherjee (amukherj)
(In reply to Vijay Bellur from comment #3) > REVIEW: http://review.gluster.org/12189 (glusterd: Use > GF_PMAP_PORT_BRICKSERVER in pmap_registry_remove from brick disconnects) > posted (#1) for review on master by Atin Mukherjee (amukherj) This patch is posted in mainline, moving the state to Assigned.
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.