Bug 1437494 - Brick Multiplexing:Volume status still shows the PID even after killing the process
Summary: Brick Multiplexing:Volume status still shows the PID even after killing the p...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Atin Mukherjee
QA Contact:
URL:
Whiteboard:
Depends On: 1434448
Blocks: 1438051
TreeView+ depends on / blocked
 
Reported: 2017-03-30 12:08 UTC by Atin Mukherjee
Modified: 2017-05-30 18:48 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.11.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1434448
: 1438051 (view as bug list)
Environment:
Last Closed: 2017-05-30 18:48:52 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Atin Mukherjee 2017-03-30 12:08:35 UTC
+++ This bug was initially created as a clone of Bug #1434448 +++

Description of problem:
==================
After enabling brick multiplexing, I killed the brick process(which is universal for that node for all bricks of all volumes) on one of the node.
I see that the process gets killed and all bricks show the online status and port number as N or N/A
However it still shows the old PID of the killed process
This PID also should be shown as N

root@dhcp35-215 bricks]# gluster v status|grep 215
Before kill the brick process(grep'ing only for bricks in this local node)

Brick 10.70.35.215:/rhs/brick3/cross3       49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick4/cross3       49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick1/ecvol        49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick2/ecvol        49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick3/ecvol        49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick4/ecvol        49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick1/ecx          49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick2/ecx          49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick3/ecx          49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick4/ecx          49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick3/rep2         49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick4/rep2         49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick3/rep3         49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick4/rep3         49152     0          Y       13072
[root@dhcp35-215 bricks]# kill -9 13072
[root@dhcp35-215 bricks]# gluster v status|grep 215
(after kill the brick process)
Brick 10.70.35.215:/rhs/brick3/cross3       N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick4/cross3       N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick1/ecvol        N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick2/ecvol        N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick3/ecvol        N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick4/ecvol        N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick1/ecx          N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick2/ecx          N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick3/ecx          N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick4/ecx          N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick3/rep2         N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick4/rep2         N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick3/rep3         N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick4/rep3         N/A       N/A        N       13072



[root@dhcp35-215 bricks]# ps -ef|grep 13072
root      2258 21234  0 19:35 pts/0    00:00:00 grep --color=auto 13072
[root@dhcp35-215 bricks]# 


Version-Release number of selected component (if applicable):
============
glusterfs-libs-3.10.0-1.el7.x86_64
glusterfs-api-3.10.0-1.el7.x86_64
glusterfs-rdma-3.10.0-1.el7.x86_64
glusterfs-3.10.0-1.el7.x86_64
python2-gluster-3.10.0-1.el7.x86_64
glusterfs-fuse-3.10.0-1.el7.x86_64
glusterfs-server-3.10.0-1.el7.x86_64
glusterfs-geo-replication-3.10.0-1.el7.x86_64
glusterfs-extra-xlators-3.10.0-1.el7.x86_64
glusterfs-client-xlators-3.10.0-1.el7.x86_64
glusterfs-cli-3.10.0-1.el7.x86_64



How reproducible:
=======
always

Steps to Reproduce:
1.enabled brick multiplexing feature
2.create a volume or multiple volume and start them
3.you can notice all bricks hosted on the same node will be having same PID
4. select a node and kill the PID
5. issue volume status

Actual results:
====
volume status still shows the PID against each brick even though the PID is killed

Expected results:
================
PID must show as N/A

--- Additional comment from Jeff Darcy on 2017-03-21 11:16:58 EDT ---

I would say that killing a process is an invalid test, but this probably needs to be fixed anyway.

Comment 1 Worker Ant 2017-03-30 12:24:48 UTC
REVIEW: https://review.gluster.org/16971 (glusterd: reset pid to -1 if brick is not online) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 2 Worker Ant 2017-03-31 13:06:25 UTC
COMMIT: https://review.gluster.org/16971 committed in master by Jeff Darcy (jeff.us) 
------
commit e325479cf222d2f25dbc0a4c6b80bfe5a7f09f43
Author: Atin Mukherjee <amukherj>
Date:   Thu Mar 30 14:47:45 2017 +0530

    glusterd: reset pid to -1 if brick is not online
    
    While populating brick details in gluster volume status response payload
    if a brick is not online then pid should be reset back to -1 so that
    volume status output doesn't show up the pid which was not cleaned up
    especially with brick multiplexing where multiple bricks belong to same
    process.
    
    Change-Id: Iba346da9a8cb5b5f5dd38031d4c5ef2097808387
    BUG: 1437494
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/16971
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Gaurav Yadav <gyadav>
    Reviewed-by: Prashanth Pai <ppai>
    Reviewed-by: Jeff Darcy <jeff.us>

Comment 3 Shyamsundar 2017-05-30 18:48:52 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.