1466608 – multiple brick processes seen on gluster(fs)d restart in brick multiplexing

Bug 1466608 - multiple brick processes seen on gluster(fs)d restart in brick multiplexing

Summary: multiple brick processes seen on gluster(fs)d restart in brick multiplexing

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Atin Mukherjee
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:	brick-multiplexing
Depends On:	1465559
Blocks:	1417151
TreeView+	depends on / blocked

Reported:	2017-06-30 04:22 UTC by Atin Mukherjee
Modified:	2018-01-18 09:28 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.8.4-32
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1465559
Environment:
Last Closed:	2017-09-21 05:02:13 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2774	0	normal	SHIPPED_LIVE	glusterfs bug fix and enhancement update	2017-09-21 08:16:29 UTC

Description Atin Mukherjee 2017-06-30 04:22:43 UTC

+++ This bug was initially created as a clone of Bug #1465559 +++

Description of problem:

On a brick multiplexing test setup if all the gluster processes are killed and glusterd is started there is a race window where one of the brick will fail to attach to one of the running brick instances and while doing so, it will try to bring up a new process resulting in deleting the socket file which has been used by one of the other brick processes resulting into brick disconnect too.

Version-Release number of selected component (if applicable):
mainline

How reproducible:
Often, not always

--- Additional comment from Worker Ant on 2017-06-27 11:54:26 EDT ---

REVIEW: https://review.gluster.org/17640 (glusterd: mark brickinfo to started on successful attach) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-06-27 11:55:46 EDT ---

REVIEW: https://review.gluster.org/17640 (glusterd: mark brickinfo to started on successful attach) posted (#2) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-06-28 02:19:05 EDT ---

COMMIT: https://review.gluster.org/17640 committed in master by Atin Mukherjee (amukherj) 
------
commit 24d09edf4b13d72a8707c801939921de0d32d4dd
Author: Atin Mukherjee <amukherj>
Date:   Tue Jun 27 21:09:49 2017 +0530

    glusterd: mark brickinfo to started on successful attach
    
    brickinfo's port & status should be filled up only when attach brick is
    successful.
    
    Change-Id: I68b181be37cb94d176f0f4692e8d9dac5493181c
    BUG: 1465559
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/17640
    Reviewed-by: Jeff Darcy <jeff.us>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Samikshan Bairagya <samikshan>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 2 Atin Mukherjee 2017-06-30 04:23:37 UTC

upstream patch : https://review.gluster.org/17640

Comment 4 Atin Mukherjee 2017-06-30 05:50:23 UTC

downstream patch : https://code.engineering.redhat.com/gerrit/#/c/110561

Comment 10 Nag Pavan Chilakam 2017-07-18 12:42:30 UTC

[root@dhcp35-45 ~]# ps -ef|grep glusterfsd
root     28114     1  5 18:11 ?        00:00:01 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id vol_1.10.70.35.45.rhs-brick1-vol_1 -p /var/lib/glusterd/vols/vol_1/run/10.70.35.45-rhs-brick1-vol_1.pid -S /var/run/gluster/f3678cbb26724c43ffe643412d02da45.socket --brick-name /rhs/brick1/vol_1 -l /var/log/glusterfs/bricks/rhs-brick1-vol_1.log --xlator-option *-posix.glusterd-uuid=0205c280-0aab-4e0b-ab74-313a58795083 --brick-port 49153 --xlator-option vol_1-server.listen-port=49153
root     28158     1  0 18:11 ?        00:00:00 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id vol_10.10.70.35.45.rhs-brick10-vol_10 -p /var/lib/glusterd/vols/vol_10/run/10.70.35.45-rhs-brick10-vol_10.pid -S /var/run/gluster/10bd4a1b912fe38eb41bfa64aff017c9.socket --brick-name /rhs/brick10/vol_10 -l /var/log/glusterfs/bricks/rhs-brick10-vol_10.log --xlator-option *-posix.glusterd-uuid=0205c280-0aab-4e0b-ab74-313a58795083 --brick-port 49154 --xlator-option vol_10-server.listen-port=49154
root     29006 13218  0 18:11 pts/1    00:00:00 grep --color=auto glusterfsd

Comment 12 Nag Pavan Chilakam 2017-07-18 12:44:30 UTC

If the above problem is not the same, kindly suggest me another way to verify this bug

Comment 17 Nag Pavan Chilakam 2017-07-21 07:22:13 UTC

on_qa validation:
I am not seeing the problem of bricks not getting connected to the socket file(ie all bricks show online in volume status and i am able to do IOs to some random volumes, that means vol status doesn't show the bricks as N/A , which was the problem as discussed with Dev)

I have run volume start stop in loop for about 50 times and didn't notice the problem


Based on this , I am moving to verified.

Comment 18 Nag Pavan Chilakam 2017-07-21 07:22:43 UTC

on_qa validation:
I am not seeing the problem of bricks not getting connected to the socket file(ie all bricks show online in volume status and i am able to do IOs to some random volumes, that means vol status doesn't show the bricks as N/A , which was the problem as discussed with Dev)

I have run volume start stop in loop for about 50 times and didn't notice the problem


Based on this , I am moving to verified.

Test version:3.8.4-34

Comment 20 errata-xmlrpc 2017-09-21 05:02:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Note You need to log in before you can comment on or make changes to this bug.