Bug 1438051 - Brick Multiplexing:Volume status still shows the PID even after killing the process
Summary: Brick Multiplexing:Volume status still shows the PID even after killing the p...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: RHGS 3.3.0
Assignee: Atin Mukherjee
QA Contact: Bala Konda Reddy M
URL:
Whiteboard: brick-multiplexing
Depends On: 1434448 1437494
Blocks: 1417151
TreeView+ depends on / blocked
 
Reported: 2017-03-31 18:18 UTC by Atin Mukherjee
Modified: 2017-09-21 04:35 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.8.4-21
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1437494
Environment:
Last Closed: 2017-09-21 04:35:56 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Atin Mukherjee 2017-03-31 18:18:11 UTC
+++ This bug was initially created as a clone of Bug #1437494 +++

+++ This bug was initially created as a clone of Bug #1434448 +++

Description of problem:
==================
After enabling brick multiplexing, I killed the brick process(which is universal for that node for all bricks of all volumes) on one of the node.
I see that the process gets killed and all bricks show the online status and port number as N or N/A
However it still shows the old PID of the killed process
This PID also should be shown as N

root@dhcp35-215 bricks]# gluster v status|grep 215
Before kill the brick process(grep'ing only for bricks in this local node)

Brick 10.70.35.215:/rhs/brick3/cross3       49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick4/cross3       49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick1/ecvol        49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick2/ecvol        49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick3/ecvol        49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick4/ecvol        49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick1/ecx          49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick2/ecx          49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick3/ecx          49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick4/ecx          49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick3/rep2         49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick4/rep2         49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick3/rep3         49152     0          Y       13072
Brick 10.70.35.215:/rhs/brick4/rep3         49152     0          Y       13072
[root@dhcp35-215 bricks]# kill -9 13072
[root@dhcp35-215 bricks]# gluster v status|grep 215
(after kill the brick process)
Brick 10.70.35.215:/rhs/brick3/cross3       N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick4/cross3       N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick1/ecvol        N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick2/ecvol        N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick3/ecvol        N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick4/ecvol        N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick1/ecx          N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick2/ecx          N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick3/ecx          N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick4/ecx          N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick3/rep2         N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick4/rep2         N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick3/rep3         N/A       N/A        N       13072
Brick 10.70.35.215:/rhs/brick4/rep3         N/A       N/A        N       13072



[root@dhcp35-215 bricks]# ps -ef|grep 13072
root      2258 21234  0 19:35 pts/0    00:00:00 grep --color=auto 13072
[root@dhcp35-215 bricks]# 


Version-Release number of selected component (if applicable):
============
glusterfs-libs-3.10.0-1.el7.x86_64
glusterfs-api-3.10.0-1.el7.x86_64
glusterfs-rdma-3.10.0-1.el7.x86_64
glusterfs-3.10.0-1.el7.x86_64
python2-gluster-3.10.0-1.el7.x86_64
glusterfs-fuse-3.10.0-1.el7.x86_64
glusterfs-server-3.10.0-1.el7.x86_64
glusterfs-geo-replication-3.10.0-1.el7.x86_64
glusterfs-extra-xlators-3.10.0-1.el7.x86_64
glusterfs-client-xlators-3.10.0-1.el7.x86_64
glusterfs-cli-3.10.0-1.el7.x86_64



How reproducible:
=======
always

Steps to Reproduce:
1.enabled brick multiplexing feature
2.create a volume or multiple volume and start them
3.you can notice all bricks hosted on the same node will be having same PID
4. select a node and kill the PID
5. issue volume status

Actual results:
====
volume status still shows the PID against each brick even though the PID is killed

Expected results:
================
PID must show as N/A

--- Additional comment from Jeff Darcy on 2017-03-21 11:16:58 EDT ---

I would say that killing a process is an invalid test, but this probably needs to be fixed anyway.

--- Additional comment from Worker Ant on 2017-03-30 08:24:48 EDT ---

REVIEW: https://review.gluster.org/16971 (glusterd: reset pid to -1 if brick is not online) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-03-31 09:06:25 EDT ---

COMMIT: https://review.gluster.org/16971 committed in master by Jeff Darcy (jeff.us) 
------
commit e325479cf222d2f25dbc0a4c6b80bfe5a7f09f43
Author: Atin Mukherjee <amukherj>
Date:   Thu Mar 30 14:47:45 2017 +0530

    glusterd: reset pid to -1 if brick is not online
    
    While populating brick details in gluster volume status response payload
    if a brick is not online then pid should be reset back to -1 so that
    volume status output doesn't show up the pid which was not cleaned up
    especially with brick multiplexing where multiple bricks belong to same
    process.
    
    Change-Id: Iba346da9a8cb5b5f5dd38031d4c5ef2097808387
    BUG: 1437494
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/16971
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Gaurav Yadav <gyadav>
    Reviewed-by: Prashanth Pai <ppai>
    Reviewed-by: Jeff Darcy <jeff.us>

Comment 2 Atin Mukherjee 2017-03-31 18:19:16 UTC
upstream patch : https://review.gluster.org/#/c/16971/

Comment 4 Atin Mukherjee 2017-04-03 10:55:51 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/102296

Comment 7 Bala Konda Reddy M 2017-04-11 13:50:08 UTC
Build Version : 3.8.4-21

Create a couple of volumes after enabling brick multiplexing. Killed the brick process in one node. In gluster volume status the pid is 'N/A' and online status 'N' for the brick process which is killed as expected. Hence marking the bug as verified

Comment 9 errata-xmlrpc 2017-09-21 04:35:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.