+++ This bug was initially created as a clone of Bug #1437494 +++ +++ This bug was initially created as a clone of Bug #1434448 +++ Description of problem: ================== After enabling brick multiplexing, I killed the brick process(which is universal for that node for all bricks of all volumes) on one of the node. I see that the process gets killed and all bricks show the online status and port number as N or N/A However it still shows the old PID of the killed process This PID also should be shown as N root@dhcp35-215 bricks]# gluster v status|grep 215 Before kill the brick process(grep'ing only for bricks in this local node) Brick 10.70.35.215:/rhs/brick3/cross3 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick4/cross3 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick1/ecvol 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick2/ecvol 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick3/ecvol 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick4/ecvol 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick1/ecx 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick2/ecx 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick3/ecx 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick4/ecx 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick3/rep2 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick4/rep2 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick3/rep3 49152 0 Y 13072 Brick 10.70.35.215:/rhs/brick4/rep3 49152 0 Y 13072 [root@dhcp35-215 bricks]# kill -9 13072 [root@dhcp35-215 bricks]# gluster v status|grep 215 (after kill the brick process) Brick 10.70.35.215:/rhs/brick3/cross3 N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick4/cross3 N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick1/ecvol N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick2/ecvol N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick3/ecvol N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick4/ecvol N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick1/ecx N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick2/ecx N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick3/ecx N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick4/ecx N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick3/rep2 N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick4/rep2 N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick3/rep3 N/A N/A N 13072 Brick 10.70.35.215:/rhs/brick4/rep3 N/A N/A N 13072 [root@dhcp35-215 bricks]# ps -ef|grep 13072 root 2258 21234 0 19:35 pts/0 00:00:00 grep --color=auto 13072 [root@dhcp35-215 bricks]# Version-Release number of selected component (if applicable): ============ glusterfs-libs-3.10.0-1.el7.x86_64 glusterfs-api-3.10.0-1.el7.x86_64 glusterfs-rdma-3.10.0-1.el7.x86_64 glusterfs-3.10.0-1.el7.x86_64 python2-gluster-3.10.0-1.el7.x86_64 glusterfs-fuse-3.10.0-1.el7.x86_64 glusterfs-server-3.10.0-1.el7.x86_64 glusterfs-geo-replication-3.10.0-1.el7.x86_64 glusterfs-extra-xlators-3.10.0-1.el7.x86_64 glusterfs-client-xlators-3.10.0-1.el7.x86_64 glusterfs-cli-3.10.0-1.el7.x86_64 How reproducible: ======= always Steps to Reproduce: 1.enabled brick multiplexing feature 2.create a volume or multiple volume and start them 3.you can notice all bricks hosted on the same node will be having same PID 4. select a node and kill the PID 5. issue volume status Actual results: ==== volume status still shows the PID against each brick even though the PID is killed Expected results: ================ PID must show as N/A --- Additional comment from Jeff Darcy on 2017-03-21 11:16:58 EDT --- I would say that killing a process is an invalid test, but this probably needs to be fixed anyway. --- Additional comment from Worker Ant on 2017-03-30 08:24:48 EDT --- REVIEW: https://review.gluster.org/16971 (glusterd: reset pid to -1 if brick is not online) posted (#1) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-03-31 09:06:25 EDT --- COMMIT: https://review.gluster.org/16971 committed in master by Jeff Darcy (jeff.us) ------ commit e325479cf222d2f25dbc0a4c6b80bfe5a7f09f43 Author: Atin Mukherjee <amukherj> Date: Thu Mar 30 14:47:45 2017 +0530 glusterd: reset pid to -1 if brick is not online While populating brick details in gluster volume status response payload if a brick is not online then pid should be reset back to -1 so that volume status output doesn't show up the pid which was not cleaned up especially with brick multiplexing where multiple bricks belong to same process. Change-Id: Iba346da9a8cb5b5f5dd38031d4c5ef2097808387 BUG: 1437494 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: https://review.gluster.org/16971 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Gaurav Yadav <gyadav> Reviewed-by: Prashanth Pai <ppai> Reviewed-by: Jeff Darcy <jeff.us>
upstream patch : https://review.gluster.org/#/c/16971/
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/102296
Build Version : 3.8.4-21 Create a couple of volumes after enabling brick multiplexing. Killed the brick process in one node. In gluster volume status the pid is 'N/A' and online status 'N' for the brick process which is killed as expected. Hence marking the bug as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774