Bug 1465559 - multiple brick processes seen on gluster(fs)d restart in brick multiplexing
multiple brick processes seen on gluster(fs)d restart in brick multiplexing
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: glusterd (Show other bugs)
mainline
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Atin Mukherjee
: Triaged
Depends On:
Blocks: 1466608 1473327
  Show dependency treegraph
 
Reported: 2017-06-27 11:45 EDT by Atin Mukherjee
Modified: 2017-09-05 13:35 EDT (History)
1 user (show)

See Also:
Fixed In Version: glusterfs-3.12.0
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1466608 (view as bug list)
Environment:
Last Closed: 2017-09-05 13:35:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Atin Mukherjee 2017-06-27 11:45:39 EDT
Description of problem:

On a brick multiplexing test setup if all the gluster processes are killed and glusterd is started there is a race window where one of the brick will fail to attach to one of the running brick instances and while doing so, it will try to bring up a new process resulting in deleting the socket file which has been used by one of the other brick processes resulting into brick disconnect too.

Version-Release number of selected component (if applicable):
mainline

How reproducible:
Often, not always
Comment 1 Worker Ant 2017-06-27 11:54:26 EDT
REVIEW: https://review.gluster.org/17640 (glusterd: mark brickinfo to started on successful attach) posted (#1) for review on master by Atin Mukherjee (amukherj@redhat.com)
Comment 2 Worker Ant 2017-06-27 11:55:46 EDT
REVIEW: https://review.gluster.org/17640 (glusterd: mark brickinfo to started on successful attach) posted (#2) for review on master by Atin Mukherjee (amukherj@redhat.com)
Comment 3 Worker Ant 2017-06-28 02:19:05 EDT
COMMIT: https://review.gluster.org/17640 committed in master by Atin Mukherjee (amukherj@redhat.com) 
------
commit 24d09edf4b13d72a8707c801939921de0d32d4dd
Author: Atin Mukherjee <amukherj@redhat.com>
Date:   Tue Jun 27 21:09:49 2017 +0530

    glusterd: mark brickinfo to started on successful attach
    
    brickinfo's port & status should be filled up only when attach brick is
    successful.
    
    Change-Id: I68b181be37cb94d176f0f4692e8d9dac5493181c
    BUG: 1465559
    Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
    Reviewed-on: https://review.gluster.org/17640
    Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Comment 4 Atin Mukherjee 2017-07-20 08:27:23 EDT
Found another race where glusterd was restarted glusterd_brick_start () is called multiple times due to friend handshaking and in one instance when one of the brick was attempted to be attached to the existing brick process, send_attach_req failed as the first brick itself was still not up and then we did a synlock_unlock ( )followed by a sleep of 1 sec, before the same thread woke up, another thread tried to start the same brick process and then it assumed that it has to start a fresh brick process.
Comment 5 Worker Ant 2017-07-20 08:51:56 EDT
REVIEW: https://review.gluster.org/17840 (glusterd: fix brick start race) posted (#1) for review on master by Atin Mukherjee (amukherj@redhat.com)
Comment 6 Worker Ant 2017-07-20 17:50:23 EDT
COMMIT: https://review.gluster.org/17840 committed in master by Jeff Darcy (jeff@pl.atyp.us) 
------
commit d095c02eb9796ca2ec2a24931c28f057c403f834
Author: Atin Mukherjee <amukherj@redhat.com>
Date:   Thu Jul 20 18:11:14 2017 +0530

    glusterd: fix brick start race
    
    Problem:
    
    Another race where glusterd was restarted glusterd_brick_start () is called
    multiple times due to friend handshaking and in one instance when one of the
    brick was attempted to be attached to the existing brick process,
    send_attach_req failed as the first brick itself was still not up and then we
    did a synlock_unlock () followed by a sleep of 1 sec, before the same thread
    woke up, another thread tried to start the same brick process and then it
    assumed that it has to start a fresh brick process.
    
    Solution:
    
    1. If brick is in starting phase (brickinfo->status ==
    GF_BRICK_STARTING), no need for a reattempt to
    start the brick.
    2. While initiating attach_req set brickinfo->status to
    GF_BRICK_STARTING
    
    Change-Id: Ib007b6199ec36fdab4214a1d37f99d7f65ef64da
    BUG: 1465559
    Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
    Reviewed-on: https://review.gluster.org/17840
    Reviewed-by: Amar Tumballi <amarts@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
Comment 7 Shyamsundar 2017-09-05 13:35:13 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.