Bug 1421590

Summary: Bricks take up new ports upon volume restart after add-brick op with brick mux enabled
Product: [Community] GlusterFS Reporter: Samikshan Bairagya <sbairagy>
Component: glusterdAssignee: Samikshan Bairagya <sbairagy>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: amukherj, bugs, jdarcy, sbairagy
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: brick-multiplexing-testing
Fixed In Version: glusterfs-3.11.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1427461 (view as bug list) Environment:
Last Closed: 2017-05-30 18:42:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1427461    

Description Samikshan Bairagya 2017-02-13 08:31:12 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce and actual results:

Taking a 1 node cluster here to list down the steps to reproduce, but this can be reproduced on multi-node cluster too.

1. Enable brick multiplexing
2. Create 1 volume with one brick
2. Start the volume and check volume status. The brick will be using port 49152
3. Add a brick to the volume and check vol status. Both bricks use 49152
4. Stop the volume and then start it.
5. Check volume status. Both bricks now use 49153.
6. If you restart the volume again and check the status, the bricks would now use 49154. For every restart, the bricks take up the next port.

Expected results:
The bricks should use the ports being used upon restart and not take up a new port.

Comment 1 Atin Mukherjee 2017-02-13 08:48:38 UTC
Samikshan - just to double check, is this issue not seen if brick mux is disabled?

Comment 2 Samikshan Bairagya 2017-02-13 09:20:47 UTC
(In reply to Atin Mukherjee from comment #1)
> Samikshan - just to double check, is this issue not seen if brick mux is
> disabled?

No. I tested this with brick mux disabled. This issue wasn't seen.

Comment 3 Jeff Darcy 2017-02-13 14:53:07 UTC
We're likely to encounter many of these "grey area" bugs which are not addressed by any existing requirements or tests.  Since fixing them is already likely to become a bottleneck, and manual testing is likely to make that even worse, it would be very helpful if other developers could provide the missing tests.  Any suggestions for how best to do that?

Comment 4 Worker Ant 2017-02-20 13:14:38 UTC
REVIEW: https://review.gluster.org/16689 (core: Clean up pmap registry up correctly on volume/brick stop) posted (#1) for review on master by Samikshan Bairagya (samikshan)

Comment 5 Worker Ant 2017-02-20 14:24:55 UTC
REVIEW: https://review.gluster.org/16689 (core: Clean up pmap registry up correctly on volume/brick stop) posted (#2) for review on master by Samikshan Bairagya (samikshan)

Comment 6 Worker Ant 2017-02-21 14:48:07 UTC
REVIEW: https://review.gluster.org/16689 (core: Clean up pmap registry up correctly on volume/brick stop) posted (#3) for review on master by Samikshan Bairagya (samikshan)

Comment 7 Worker Ant 2017-02-27 22:59:07 UTC
COMMIT: https://review.gluster.org/16689 committed in master by Jeff Darcy (jdarcy) 
------
commit 1e3538baab7abc29ac329c78182b62558da56d98
Author: Samikshan Bairagya <samikshan>
Date:   Mon Feb 20 18:35:01 2017 +0530

    core: Clean up pmap registry up correctly on volume/brick stop
    
    This commit changes the following:
    1. In glusterfs_handle_terminate, send out individual pmap signout
    requests to glusterd for every brick.
    2. Add another parameter to glusterfs_mgmt_pmap_signout function to
    pass the brickname that needs to be removed from the pmap registry.
    3. Make sure pmap_registry_search doesn't break out from the loop
    iterating over the list of bricks per port if the first brick entry
    corresponding to a port is whitespaced out.
    4. Make sure the pmap registry entries are removed for other
    daemons like snapd.
    
    Change-Id: I69949874435b02699e5708dab811777ccb297174
    BUG: 1421590
    Signed-off-by: Samikshan Bairagya <samikshan>
    Reviewed-on: https://review.gluster.org/16689
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Gaurav Yadav <gyadav>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 8 Shyamsundar 2017-05-30 18:42:26 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/