Bug 1444596 - [Brick Multiplexing] : Bricks for multiple volumes going down after glusterd restart and not coming back up after volume start force
Summary: [Brick Multiplexing] : Bricks for multiple volumes going down after glusterd ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Mohit Agrawal
QA Contact:
URL:
Whiteboard: brick-multiplexing
Depends On:
Blocks: 1443972 1449002 1449003 1449004
TreeView+ depends on / blocked
 
Reported: 2017-04-23 03:17 UTC by Mohit Agrawal
Modified: 2017-10-26 14:37 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.12.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1443972
: 1449002 1449004 (view as bug list)
Environment:
Last Closed: 2017-09-05 17:27:45 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Worker Ant 2017-04-23 03:45:48 UTC
REVIEW: https://review.gluster.org/17101 (glusterd: cli is not showing correct status after restart glusted while mux is on) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 2 Worker Ant 2017-04-23 04:20:04 UTC
REVIEW: https://review.gluster.org/17101 (glusterd: cli is not showing correct status after restart glusted while mux is on) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 3 Worker Ant 2017-04-28 10:31:58 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#3) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 4 Worker Ant 2017-04-28 10:38:08 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#4) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 5 Worker Ant 2017-04-29 04:30:47 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#5) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 6 Worker Ant 2017-04-29 04:44:16 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#6) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 7 Worker Ant 2017-04-29 04:56:36 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#7) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 8 Worker Ant 2017-04-29 15:10:21 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#8) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 9 Worker Ant 2017-04-30 01:32:31 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#9) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 10 Worker Ant 2017-04-30 01:58:09 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#10) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 11 Worker Ant 2017-04-30 08:01:11 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#11) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 12 Worker Ant 2017-04-30 09:15:51 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#12) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 13 Worker Ant 2017-04-30 12:12:01 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#13) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 14 Worker Ant 2017-04-30 13:06:26 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#14) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 15 Worker Ant 2017-04-30 13:20:21 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#15) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 16 Worker Ant 2017-05-01 03:59:00 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#16) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 17 Worker Ant 2017-05-01 07:31:21 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#17) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 18 Worker Ant 2017-05-01 12:05:49 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#18) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 19 Worker Ant 2017-05-02 04:03:39 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#19) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 20 Worker Ant 2017-05-02 04:57:52 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#20) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 21 Worker Ant 2017-05-02 07:15:24 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#21) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 22 Worker Ant 2017-05-02 09:06:51 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#22) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 23 Worker Ant 2017-05-02 11:02:21 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#23) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 24 Worker Ant 2017-05-02 16:20:30 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#24) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 25 Worker Ant 2017-05-03 10:42:02 UTC
REVIEW: https://review.gluster.org/17101 (glusterd: cli is not showing correct status after restart glusted while mux is on) posted (#25) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 26 Worker Ant 2017-05-03 18:42:02 UTC
REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#27) for review on master by Atin Mukherjee (amukherj)

Comment 27 Worker Ant 2017-05-03 18:42:07 UTC
REVIEW: https://review.gluster.org/17168 (glusterd: cleanup pidfile on pmap signout) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 28 Worker Ant 2017-05-03 18:58:28 UTC
REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick-multiplexing) posted (#28) for review on master by Atin Mukherjee (amukherj)

Comment 29 Worker Ant 2017-05-03 19:02:30 UTC
REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick-multiplexing) posted (#29) for review on master by Atin Mukherjee (amukherj)

Comment 30 Worker Ant 2017-05-04 04:16:56 UTC
REVIEW: https://review.gluster.org/17168 (glusterd: cleanup pidfile on pmap signout) posted (#2) for review on master by Atin Mukherjee (amukherj)

Comment 31 Worker Ant 2017-05-04 04:17:01 UTC
REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick-multiplexing) posted (#30) for review on master by Atin Mukherjee (amukherj)

Comment 32 Worker Ant 2017-05-04 09:55:41 UTC
REVIEW: https://review.gluster.org/17168 (glusterd: cleanup pidfile on pmap signout) posted (#3) for review on master by Atin Mukherjee (amukherj)

Comment 33 Worker Ant 2017-05-04 09:55:46 UTC
REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick-multiplexing feature) posted (#31) for review on master by Atin Mukherjee (amukherj)

Comment 34 Worker Ant 2017-05-07 17:17:40 UTC
REVIEW: https://review.gluster.org/17168 (glusterd: cleanup pidfile on pmap signout) posted (#4) for review on master by Atin Mukherjee (amukherj)

Comment 35 Worker Ant 2017-05-07 17:17:45 UTC
REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick multiplexing feature) posted (#32) for review on master by Atin Mukherjee (amukherj)

Comment 36 Worker Ant 2017-05-08 13:13:44 UTC
COMMIT: https://review.gluster.org/17168 committed in master by Jeff Darcy (jeff.us) 
------
commit 3d35e21ffb15713237116d85711e9cd1dda1688a
Author: Atin Mukherjee <amukherj>
Date:   Wed May 3 12:17:30 2017 +0530

    glusterd: cleanup pidfile on pmap signout
    
    This patch ensures
    1. brick pidfile is cleaned up on pmap signout
    2. pmap signout evemt is sent for all the bricks when a brick process
    shuts down.
    
    Change-Id: I7606a60775b484651d4b9743b6037b40323931a2
    BUG: 1444596
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/17168
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jeff.us>

Comment 37 Worker Ant 2017-05-08 14:00:47 UTC
REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick multiplexing feature) posted (#33) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 38 Worker Ant 2017-05-08 14:18:35 UTC
REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick multiplexing feature) posted (#34) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 39 Worker Ant 2017-05-09 01:30:05 UTC
COMMIT: https://review.gluster.org/17101 committed in master by Atin Mukherjee (amukherj) 
------
commit 21c7f7baccfaf644805e63682e5a7d2a9864a1e6
Author: Mohit Agrawal <moagrawa>
Date:   Mon May 8 19:29:22 2017 +0530

    glusterd: socketfile & pidfile related fixes for brick multiplexing feature
    
    Problem: While brick-muliplexing is on after restarting glusterd, CLI is
             not showing pid of all brick processes in all volumes.
    
    Solution: While brick-mux is on all local brick process communicated through one
              UNIX socket but as per current code (glusterd_brick_start) it is trying
              to communicate with separate UNIX socket for each volume which is populated
              based on brick-name and vol-name.Because of multiplexing design only one
              UNIX socket is opened so it is throwing poller error and not able to
              fetch correct status of brick process through cli process.
              To resolve the problem write a new function glusterd_set_socket_filepath_for_mux
              that will call by glusterd_brick_start to validate about the existence of socketpath.
              To avoid the continuous EPOLLERR erros in  logs update socket_connect code.
    
    Test:     To reproduce the issue followed below steps
              1) Create two distributed volumes(dist1 and dist2)
              2) Set cluster.brick-multiplex is on
              3) kill glusterd
              4) run command gluster v status
              After apply the patch it shows correct pid for all volumes
    
    BUG: 1444596
    Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2
    Signed-off-by: Mohit Agrawal <moagrawa>
    Reviewed-on: https://review.gluster.org/17101
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Prashanth Pai <ppai>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 40 Worker Ant 2017-05-09 01:37:56 UTC
REVIEW: https://review.gluster.org/17208 (posix: Send SIGKILL in 2nd attempt) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 41 Worker Ant 2017-05-09 08:02:09 UTC
REVIEW: https://review.gluster.org/17208 (posix: Send SIGKILL in 2nd attempt) posted (#2) for review on master by Atin Mukherjee (amukherj)

Comment 42 Worker Ant 2017-05-09 09:04:52 UTC
COMMIT: https://review.gluster.org/17208 committed in master by Atin Mukherjee (amukherj) 
------
commit 4f4ad03e0c4739d3fe1b0640ab8b4e1ffc985374
Author: Atin Mukherjee <amukherj>
Date:   Tue May 9 07:05:18 2017 +0530

    posix: Send SIGKILL in 2nd attempt
    
    Commit 21c7f7ba changed the signal from SIGKILL to SIGTERM for the 2nd
    attempt to terminate the brick process if SIGTERM fails. This patch
    fixes this problem.
    
    Change-Id: I856df607b7109a215f2a2a4827ba3ea42d8a9729
    BUG: 1444596
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/17208
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Smoke: Gluster Build System <jenkins.org>

Comment 43 Shyamsundar 2017-09-05 17:27:45 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.