Bug 1466608 - multiple brick processes seen on gluster(fs)d restart in brick multiplexing
Summary: multiple brick processes seen on gluster(fs)d restart in brick multiplexing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.3.0
Assignee: Atin Mukherjee
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard: brick-multiplexing
Depends On: 1465559
Blocks: 1417151
TreeView+ depends on / blocked
 
Reported: 2017-06-30 04:22 UTC by Atin Mukherjee
Modified: 2018-01-18 09:28 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8.4-32
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1465559
Environment:
Last Closed: 2017-09-21 05:02:13 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Atin Mukherjee 2017-06-30 04:22:43 UTC
+++ This bug was initially created as a clone of Bug #1465559 +++

Description of problem:

On a brick multiplexing test setup if all the gluster processes are killed and glusterd is started there is a race window where one of the brick will fail to attach to one of the running brick instances and while doing so, it will try to bring up a new process resulting in deleting the socket file which has been used by one of the other brick processes resulting into brick disconnect too.

Version-Release number of selected component (if applicable):
mainline

How reproducible:
Often, not always

--- Additional comment from Worker Ant on 2017-06-27 11:54:26 EDT ---

REVIEW: https://review.gluster.org/17640 (glusterd: mark brickinfo to started on successful attach) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-06-27 11:55:46 EDT ---

REVIEW: https://review.gluster.org/17640 (glusterd: mark brickinfo to started on successful attach) posted (#2) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-06-28 02:19:05 EDT ---

COMMIT: https://review.gluster.org/17640 committed in master by Atin Mukherjee (amukherj) 
------
commit 24d09edf4b13d72a8707c801939921de0d32d4dd
Author: Atin Mukherjee <amukherj>
Date:   Tue Jun 27 21:09:49 2017 +0530

    glusterd: mark brickinfo to started on successful attach
    
    brickinfo's port & status should be filled up only when attach brick is
    successful.
    
    Change-Id: I68b181be37cb94d176f0f4692e8d9dac5493181c
    BUG: 1465559
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/17640
    Reviewed-by: Jeff Darcy <jeff.us>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Samikshan Bairagya <samikshan>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 2 Atin Mukherjee 2017-06-30 04:23:37 UTC
upstream patch : https://review.gluster.org/17640

Comment 4 Atin Mukherjee 2017-06-30 05:50:23 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/110561

Comment 10 Nag Pavan Chilakam 2017-07-18 12:42:30 UTC
[root@dhcp35-45 ~]# ps -ef|grep glusterfsd
root     28114     1  5 18:11 ?        00:00:01 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id vol_1.10.70.35.45.rhs-brick1-vol_1 -p /var/lib/glusterd/vols/vol_1/run/10.70.35.45-rhs-brick1-vol_1.pid -S /var/run/gluster/f3678cbb26724c43ffe643412d02da45.socket --brick-name /rhs/brick1/vol_1 -l /var/log/glusterfs/bricks/rhs-brick1-vol_1.log --xlator-option *-posix.glusterd-uuid=0205c280-0aab-4e0b-ab74-313a58795083 --brick-port 49153 --xlator-option vol_1-server.listen-port=49153
root     28158     1  0 18:11 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id vol_10.10.70.35.45.rhs-brick10-vol_10 -p /var/lib/glusterd/vols/vol_10/run/10.70.35.45-rhs-brick10-vol_10.pid -S /var/run/gluster/10bd4a1b912fe38eb41bfa64aff017c9.socket --brick-name /rhs/brick10/vol_10 -l /var/log/glusterfs/bricks/rhs-brick10-vol_10.log --xlator-option *-posix.glusterd-uuid=0205c280-0aab-4e0b-ab74-313a58795083 --brick-port 49154 --xlator-option vol_10-server.listen-port=49154
root     29006 13218  0 18:11 pts/1    00:00:00 grep --color=auto glusterfsd

Comment 12 Nag Pavan Chilakam 2017-07-18 12:44:30 UTC
If the above problem is not the same, kindly suggest me another way to verify this bug

Comment 17 Nag Pavan Chilakam 2017-07-21 07:22:13 UTC
on_qa validation:
I am not seeing the problem of bricks not getting connected to the socket file(ie all bricks show online in volume status and i am able to do IOs to some random volumes, that means vol status doesn't show the bricks as N/A , which was the problem as discussed with Dev)

I have run volume start stop in loop for about 50 times and didn't notice the problem


Based on this , I am moving to verified.

Comment 18 Nag Pavan Chilakam 2017-07-21 07:22:43 UTC
on_qa validation:
I am not seeing the problem of bricks not getting connected to the socket file(ie all bricks show online in volume status and i am able to do IOs to some random volumes, that means vol status doesn't show the bricks as N/A , which was the problem as discussed with Dev)

I have run volume start stop in loop for about 50 times and didn't notice the problem


Based on this , I am moving to verified.

Test version:3.8.4-34

Comment 20 errata-xmlrpc 2017-09-21 05:02:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.