Bug 1449933

Summary: Brick Multiplexing :- resetting a brick bring down other bricks with same PID
Product: [Community] GlusterFS Reporter: Samikshan Bairagya <sbairagy>
Component: glusterdAssignee: Samikshan Bairagya <sbairagy>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.11CC: amukherj, bugs, ksandha, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: brick-multiplexing
Fixed In Version: glusterfs-3.11.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1446172 Environment:
Last Closed: 2017-05-30 18:52:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1446172    
Bug Blocks: 1443843    

Description Samikshan Bairagya 2017-05-11 08:00:18 UTC
+++ This bug was initially created as a clone of Bug #1446172 +++

+++ This bug was initially created as a clone of Bug #1443843 +++

Description of problem:
resetting a single brick bring the other brick down with same pid

Version-Release number of selected component (if applicable):
3.8.4-22

How reproducible:
100% 

Steps to Reproduce:
1.

[root@K1 ~]# gluster v reset-brick testvol 10.70.47.60:/bricks/brick0/b3 start
volume reset-brick: success: reset-brick start operation successful

2. 

[root@K1 b3]# gluster v status testvol
Status of volume: testvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.60:/bricks/brick0/b3         N/A       N/A        N       N/A  
Brick 10.70.46.218:/bricks/brick0/b2        49152     0          Y       374  
Brick 10.70.47.61:/bricks/brick0/b3         49152     0          Y       24892
Brick 10.70.47.60:/bricks/brick1/b3         N/A       N/A        N       N/A  
Brick 10.70.46.218:/bricks/brick1/b2        49152     0          Y       374  
Brick 10.70.47.61:/bricks/brick1/b3         49152     0          Y       24892
Brick 10.70.46.218:/bricks/brick2/b2        49152     0          Y       374  
Brick 10.70.47.61:/bricks/brick2/b3         49152     0          Y       24892
Brick 10.70.47.60:/bricks/brick2/b3         49153     0          Y       1629 
NFS Server on localhost                     2049      0          Y       1653 
Self-heal Daemon on localhost               N/A       N/A        Y       1662 
NFS Server on 10.70.46.218                  2049      0          Y       698  
Self-heal Daemon on 10.70.46.218            N/A       N/A        Y       707  
NFS Server on 10.70.47.61                   2049      0          Y       25123
Self-heal Daemon on 10.70.47.61             N/A       N/A        Y       25132
 
Task Status of Volume testvol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : e686e9ea-ad3d-4135-933d-2836075c16d7
Status               : completed           
 

3.

Actual results:
two bricks go down with same pid

Expected results:
Other brick should be unaffected.

--- Additional comment from Worker Ant on 2017-04-27 07:53:23 EDT ---

REVIEW: https://review.gluster.org/17128 (glusterd: Make reset-brick work correctly if brick-mux is on) posted (#1) for review on master by Samikshan Bairagya (samikshan)

--- Additional comment from Worker Ant on 2017-05-09 00:57:49 EDT ---

REVIEW: https://review.gluster.org/17128 (glusterd: Make reset-brick work correctly if brick-mux is on) posted (#2) for review on master by Samikshan Bairagya (samikshan)

--- Additional comment from Worker Ant on 2017-05-10 00:45:06 EDT ---

REVIEW: https://review.gluster.org/17128 (glusterd: Make reset-brick work correctly if brick-mux is on) posted (#3) for review on master by Samikshan Bairagya (samikshan)

--- Additional comment from Worker Ant on 2017-05-10 01:36:33 EDT ---

REVIEW: https://review.gluster.org/17128 (glusterd: Make reset-brick work correctly if brick-mux is on) posted (#4) for review on master by Samikshan Bairagya (samikshan)

--- Additional comment from Worker Ant on 2017-05-10 08:09:30 EDT ---

REVIEW: https://review.gluster.org/17128 (glusterd: Make reset-brick work correctly if brick-mux is on) posted (#5) for review on master by Samikshan Bairagya (samikshan)

--- Additional comment from Worker Ant on 2017-05-10 14:58:24 EDT ---

COMMIT: https://review.gluster.org/17128 committed in master by Jeff Darcy (jeff.us) 
------
commit 74383e3ec6f8244b3de9bf14016452498c1ddcf0
Author: Samikshan Bairagya <samikshan>
Date:   Mon Apr 24 22:00:17 2017 +0530

    glusterd: Make reset-brick work correctly if brick-mux is on
    
    Reset brick currently kills of the corresponding brick process.
    However, with brick multiplexing enabled, stopping the brick
    process would render all bricks attached to it unavailable. To
    handle this correctly, we need to make sure that the brick process
    is terminated only if brick-multiplexing is disabled. Otherwise,
    we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective
    brick process to detach the brick that is to be reset.
    
    Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58
    BUG: 1446172
    Signed-off-by: Samikshan Bairagya <samikshan>
    Reviewed-on: https://review.gluster.org/17128
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 1 Worker Ant 2017-05-11 08:14:55 UTC
REVIEW: https://review.gluster.org/17245 (glusterd: Make reset-brick work correctly if brick-mux is on) posted (#1) for review on release-3.11 by Samikshan Bairagya (samikshan)

Comment 2 Worker Ant 2017-05-16 00:29:52 UTC
COMMIT: https://review.gluster.org/17245 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit cec4c8fc25e34459c23693f2928dcaefb9a68c69
Author: Samikshan Bairagya <samikshan>
Date:   Mon Apr 24 22:00:17 2017 +0530

    glusterd: Make reset-brick work correctly if brick-mux is on
    
    Reset brick currently kills of the corresponding brick process.
    However, with brick multiplexing enabled, stopping the brick
    process would render all bricks attached to it unavailable. To
    handle this correctly, we need to make sure that the brick process
    is terminated only if brick-multiplexing is disabled. Otherwise,
    we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective
    brick process to detach the brick that is to be reset.
    
    > Reviewed-on: https://review.gluster.org/17128
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Atin Mukherjee <amukherj>
    
    (cherry picked from commit 74383e3ec6f8244b3de9bf14016452498c1ddcf0)
    
    Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58
    BUG: 1449933
    Signed-off-by: Samikshan Bairagya <samikshan>
    Reviewed-on: https://review.gluster.org/17245
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 3 Shyamsundar 2017-05-30 18:52:18 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/