Bug 1449004 - [Brick Multiplexing] : Bricks for multiple volumes going down after glusterd restart and not coming back up after volume start force
Summary: [Brick Multiplexing] : Bricks for multiple volumes going down after glusterd ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 3.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard: brick-multiplexing
Depends On: 1444596
Blocks: 1449003
TreeView+ depends on / blocked
 
Reported: 2017-05-09 03:58 UTC by Atin Mukherjee
Modified: 2017-05-30 18:51 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.11.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1444596
Environment:
Last Closed: 2017-05-30 18:51:45 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Worker Ant 2017-05-09 03:59:52 UTC
REVIEW: https://review.gluster.org/17211 (glusterd: cleanup pidfile on pmap signout) posted (#1) for review on release-3.11 by Atin Mukherjee (amukherj)

Comment 2 Worker Ant 2017-05-09 04:01:26 UTC
REVIEW: https://review.gluster.org/17212 (glusterd: socketfile & pidfile related fixes for brick multiplexing feature) posted (#1) for review on release-3.11 by MOHIT AGRAWAL (moagrawa)

Comment 3 Worker Ant 2017-05-10 14:05:56 UTC
COMMIT: https://review.gluster.org/17212 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit 7287b46042f805d646d7e117c243a1a4fdc61788
Author: Mohit Agrawal <moagrawa>
Date:   Mon May 8 19:29:22 2017 +0530

    glusterd: socketfile & pidfile related fixes for brick multiplexing feature
    
    Problem: While brick-muliplexing is on after restarting glusterd, CLI is
             not showing pid of all brick processes in all volumes.
    
    Solution: While brick-mux is on all local brick process communicated through one
              UNIX socket but as per current code (glusterd_brick_start) it is trying
              to communicate with separate UNIX socket for each volume which is populated
              based on brick-name and vol-name.Because of multiplexing design only one
              UNIX socket is opened so it is throwing poller error and not able to
              fetch correct status of brick process through cli process.
              To resolve the problem write a new function glusterd_set_socket_filepath_for_mux
              that will call by glusterd_brick_start to validate about the existence of socketpath.
              To avoid the continuous EPOLLERR erros in  logs update socket_connect code.
    
    Test:     To reproduce the issue followed below steps
              1) Create two distributed volumes(dist1 and dist2)
              2) Set cluster.brick-multiplex is on
              3) kill glusterd
              4) run command gluster v status
              After apply the patch it shows correct pid for all volumes
    
    > BUG: 1444596
    > Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2
    > Signed-off-by: Mohit Agrawal <moagrawa>
    > Reviewed-on: https://review.gluster.org/17101
    > Smoke: Gluster Build System <jenkins.org>
    > Reviewed-by: Prashanth Pai <ppai>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Atin Mukherjee <amukherj>
    > (cherry picked from commit 21c7f7baccfaf644805e63682e5a7d2a9864a1e6)
    
    Change-Id: Ia95b9d36e50566b293a8d6350f8316dafc27033b
    BUG: 1449004
    Signed-off-by: Mohit Agrawal <moagrawa>
    Reviewed-on: https://review.gluster.org/17212
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>
    Reviewed-by: Prashanth Pai <ppai>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 4 Worker Ant 2017-05-10 14:06:22 UTC
COMMIT: https://review.gluster.org/17211 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit 25e24c5ab7202d43afa837cf5159e14fe078cc73
Author: Atin Mukherjee <amukherj>
Date:   Wed May 3 12:17:30 2017 +0530

    glusterd: cleanup pidfile on pmap signout
    
    This patch ensures
    1. brick pidfile is cleaned up on pmap signout
    2. pmap signout evemt is sent for all the bricks when a brick process
    shuts down.
    
    >Reviewed-on: https://review.gluster.org/17168
    >Smoke: Gluster Build System <jenkins.org>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: Jeff Darcy <jeff.us>
    >(cherry picked from commit 3d35e21ffb15713237116d85711e9cd1dda1688a)
    
    Change-Id: I7606a60775b484651d4b9743b6037b40323931a2
    BUG: 1449004
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/17211
    Reviewed-by: Prashanth Pai <ppai>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jeff.us>

Comment 5 Worker Ant 2017-05-12 04:08:39 UTC
REVIEW: https://review.gluster.org/17260 (posix: Send SIGKILL in 2nd attempt) posted (#1) for review on release-3.11 by Atin Mukherjee (amukherj)

Comment 6 Worker Ant 2017-05-12 13:33:32 UTC
COMMIT: https://review.gluster.org/17260 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit 76776317825cee0cec715bd5558d091bdbc0fcdf
Author: Atin Mukherjee <amukherj>
Date:   Tue May 9 07:05:18 2017 +0530

    posix: Send SIGKILL in 2nd attempt
    
    Commit 21c7f7ba changed the signal from SIGKILL to SIGTERM for the 2nd
    attempt to terminate the brick process if SIGTERM fails. This patch
    fixes this problem.
    
    >Reviewed-on: https://review.gluster.org/17208
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    >Smoke: Gluster Build System <jenkins.org>
    >(cherry picked from commit 4f4ad03e0c4739d3fe1b0640ab8b4e1ffc985374)
    
    Change-Id: I856df607b7109a215f2a2a4827ba3ea42d8a9729
    BUG: 1449004
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/17260
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: MOHIT AGRAWAL <moagrawa>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Prashanth Pai <ppai>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 7 Shyamsundar 2017-05-30 18:51:45 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.