+++ This bug was initially created as a clone of Bug #1465559 +++ Description of problem: On a brick multiplexing test setup if all the gluster processes are killed and glusterd is started there is a race window where one of the brick will fail to attach to one of the running brick instances and while doing so, it will try to bring up a new process resulting in deleting the socket file which has been used by one of the other brick processes resulting into brick disconnect too. Version-Release number of selected component (if applicable): mainline How reproducible: Often, not always --- Additional comment from Worker Ant on 2017-06-27 11:54:26 EDT --- REVIEW: https://review.gluster.org/17640 (glusterd: mark brickinfo to started on successful attach) posted (#1) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-06-27 11:55:46 EDT --- REVIEW: https://review.gluster.org/17640 (glusterd: mark brickinfo to started on successful attach) posted (#2) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-06-28 02:19:05 EDT --- COMMIT: https://review.gluster.org/17640 committed in master by Atin Mukherjee (amukherj) ------ commit 24d09edf4b13d72a8707c801939921de0d32d4dd Author: Atin Mukherjee <amukherj> Date: Tue Jun 27 21:09:49 2017 +0530 glusterd: mark brickinfo to started on successful attach brickinfo's port & status should be filled up only when attach brick is successful. Change-Id: I68b181be37cb94d176f0f4692e8d9dac5493181c BUG: 1465559 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: https://review.gluster.org/17640 Reviewed-by: Jeff Darcy <jeff.us> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Samikshan Bairagya <samikshan> CentOS-regression: Gluster Build System <jenkins.org>
upstream patch : https://review.gluster.org/17640
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/110561
[root@dhcp35-45 ~]# ps -ef|grep glusterfsd root 28114 1 5 18:11 ? 00:00:01 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id vol_1.10.70.35.45.rhs-brick1-vol_1 -p /var/lib/glusterd/vols/vol_1/run/10.70.35.45-rhs-brick1-vol_1.pid -S /var/run/gluster/f3678cbb26724c43ffe643412d02da45.socket --brick-name /rhs/brick1/vol_1 -l /var/log/glusterfs/bricks/rhs-brick1-vol_1.log --xlator-option *-posix.glusterd-uuid=0205c280-0aab-4e0b-ab74-313a58795083 --brick-port 49153 --xlator-option vol_1-server.listen-port=49153 root 28158 1 0 18:11 ? 00:00:00 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id vol_10.10.70.35.45.rhs-brick10-vol_10 -p /var/lib/glusterd/vols/vol_10/run/10.70.35.45-rhs-brick10-vol_10.pid -S /var/run/gluster/10bd4a1b912fe38eb41bfa64aff017c9.socket --brick-name /rhs/brick10/vol_10 -l /var/log/glusterfs/bricks/rhs-brick10-vol_10.log --xlator-option *-posix.glusterd-uuid=0205c280-0aab-4e0b-ab74-313a58795083 --brick-port 49154 --xlator-option vol_10-server.listen-port=49154 root 29006 13218 0 18:11 pts/1 00:00:00 grep --color=auto glusterfsd
If the above problem is not the same, kindly suggest me another way to verify this bug
on_qa validation: I am not seeing the problem of bricks not getting connected to the socket file(ie all bricks show online in volume status and i am able to do IOs to some random volumes, that means vol status doesn't show the bricks as N/A , which was the problem as discussed with Dev) I have run volume start stop in loop for about 50 times and didn't notice the problem Based on this , I am moving to verified.
on_qa validation: I am not seeing the problem of bricks not getting connected to the socket file(ie all bricks show online in volume status and i am able to do IOs to some random volumes, that means vol status doesn't show the bricks as N/A , which was the problem as discussed with Dev) I have run volume start stop in loop for about 50 times and didn't notice the problem Based on this , I am moving to verified. Test version:3.8.4-34
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774