Bug 1530448

Summary: glustershd fails to start on a volume force start after a brick is down
Product: [Community] GlusterFS Reporter: Atin Mukherjee <amukherj>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.12CC: amukherj, bmekala, bugs, nchilaka, rhs-bugs, storage-qa-internal, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: brick-multiplexing
Fixed In Version: glusterfs-3.12.5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1530281
: 1530449 (view as bug list) Environment:
Last Closed: 2018-02-01 04:43:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1530217, 1530281, 1530325    
Bug Blocks: 1530449, 1530450    

Comment 1 Worker Ant 2018-01-03 04:51:29 UTC
REVIEW: https://review.gluster.org/19123 (glusterd: Nullify pmap entry for bricks belonging to same port) posted (#1) for review on release-3.12 by Atin Mukherjee

Comment 2 Atin Mukherjee 2018-01-03 04:52:20 UTC
Description of problem:
======================
glustershd fails to start on one of the nodes when we do a volume force start to bring a brick online.

Version-Release number of selected component (if applicable):
===========
mainline

How reproducible:
=================
3/5

Steps to Reproduce:
1. create a brick mux setup
2. create about 30 1x3 volumes
3. start the volumes
4. pump IOs to the base volume and another volume(i created an extra ecvol for this)
5.now kill a brick say b1
6. use volume force start of any volume(some vol in higher ascending order say vol15 or vol20 ...and not the base volume)



Actual results:
=========
shd fails to start on one of the vols

--- Additional comment from Worker Ant on 2018-01-02 09:59:25 EST ---

REVIEW: https://review.gluster.org/19119 (glusterd: Nullify pmap entry for bricks belonging to same port) posted (#1) for review on master by Atin Mukherjee

--- Additional comment from Worker Ant on 2018-01-02 20:23:23 EST ---

COMMIT: https://review.gluster.org/19119 committed in master by \"Atin Mukherjee\" <amukherj> with a commit message- glusterd: Nullify pmap entry for bricks belonging to same port

Commit 30e0b86 tried to address all the stale port issues glusterd had
in case of a brick is abruptly killed. For brick multiplexing case
because of a bug the portmap entry was not getting removed. This patch
addresses the same.

Change-Id: Ib020b967a9b92f1abae9cab9492f0cacec59aaa1
BUG: 1530281
Signed-off-by: Atin Mukherjee <amukherj>

Comment 3 Worker Ant 2018-01-10 06:52:00 UTC
COMMIT: https://review.gluster.org/19123 committed in release-3.12 by \"Atin Mukherjee\" <amukherj> with a commit message- glusterd: Nullify pmap entry for bricks belonging to same port

Commit 30e0b86 tried to address all the stale port issues glusterd had
in case of a brick is abruptly killed. For brick multiplexing case
because of a bug the portmap entry was not getting removed. This patch
addresses the same.

>mainline patch : https://review.gluster.org/#/c/19119/

Change-Id: Ib020b967a9b92f1abae9cab9492f0cacec59aaa1
BUG: 1530448
Signed-off-by: Atin Mukherjee <amukherj>

Comment 4 Jiffin 2018-02-01 04:43:43 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.5, please open a new bug report.

glusterfs-3.12.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-devel/2018-February/054356.html
[2] https://www.gluster.org/pipermail/gluster-users/