Bug 1425556

Summary: glusterd log is flooded with stale disconnect rpc messages
Product: [Community] GlusterFS Reporter: Atin Mukherjee <amukherj>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: 3.10CC: bugs, gyadav, srangana
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: brick-multiplexing-testing
Fixed In Version: glusterfs-3.10.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1421724 Environment:
Last Closed: 2017-02-27 15:30:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1421724    
Bug Blocks: 1416031    

Description Atin Mukherjee 2017-02-21 17:29:23 UTC
+++ This bug was initially created as a clone of Bug #1421724 +++

Description of problem:

glusterd log is flooded with following error message:

<snip>

[2017-02-13 14:27:59.706023] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b6
[2017-02-13 14:27:59.706442] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b7
[2017-02-13 14:27:59.706920] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b8
[2017-02-13 14:28:02.707970] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b6
[2017-02-13 14:28:02.708449] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b7
[2017-02-13 14:28:02.708937] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b8
[2017-02-13 14:28:05.709992] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b6
[2017-02-13 14:28:05.710565] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b7
[2017-02-13 14:28:05.711114] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b8
[2017-02-13 14:28:08.710356] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b6
[2017-02-13 14:28:08.710733] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b7
[2017-02-13 14:28:08.711139] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b8
[2017-02-13 14:28:11.711076] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b6
[2017-02-13 14:28:11.711179] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b7
[2017-02-13 14:28:11.711269] W [glusterd-handler.c:5684:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /tmp/b8

</snip>

refer "steps reproducible" section for the reproducer

Version-Release number of selected component (if applicable):
mainline

How reproducible:
Always

Steps to Reproduce:
1. create & start a volume (brick mux is off)
2. turn on brick mux
3. stop and start the volume

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2017-02-21 08:04:32 EST ---

REVIEW: https://review.gluster.org/16699 (glusterd: unref brickinfo object on volume stop) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-02-21 11:56:12 EST ---

REVIEW: https://review.gluster.org/16702 (glusterd : Fix for flooding of logs when multiplex is enable) posted (#1) for review on master by Gaurav Yadav (gyadav)

--- Additional comment from Worker Ant on 2017-02-21 12:25:13 EST ---

COMMIT: https://review.gluster.org/16699 committed in master by Atin Mukherjee (amukherj) 
------
commit 9cdfbdced23cd43b8738636a3ed906c8d4267d67
Author: Atin Mukherjee <amukherj>
Date:   Tue Feb 21 18:33:14 2017 +0530

    glusterd: unref brickinfo object on volume stop
    
    If brick multiplexing is enabled, on a volume stop glusterd was not
    unrefing the brickinfo rpc object which lead to a flood of stale rpc
    logs.
    
    Change-Id: I18fedcd6921042ef2e945605466194b7b53fe2f7
    BUG: 1421724
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/16699
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jdarcy>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Samikshan Bairagya <samikshan>

Comment 1 Worker Ant 2017-02-21 17:31:12 UTC
REVIEW: https://review.gluster.org/16703 (glusterd: unref brickinfo object on volume stop) posted (#1) for review on release-3.10 by Atin Mukherjee (amukherj)

Comment 2 Worker Ant 2017-02-21 21:39:56 UTC
COMMIT: https://review.gluster.org/16703 committed in release-3.10 by Shyamsundar Ranganathan (srangana) 
------
commit eebd57015150c971395d47cc1e6216c2acd4ec17
Author: Atin Mukherjee <amukherj>
Date:   Tue Feb 21 18:33:14 2017 +0530

    glusterd: unref brickinfo object on volume stop
    
    If brick multiplexing is enabled, on a volume stop glusterd was not
    unrefing the brickinfo rpc object which lead to a flood of stale rpc
    logs.
    
    >Reviewed-on: https://review.gluster.org/16699
    >Smoke: Gluster Build System <jenkins.org>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >Reviewed-by: Jeff Darcy <jdarcy>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: Samikshan Bairagya <samikshan>
    >(cherry picked from commit 9cdfbdced23cd43b8738636a3ed906c8d4267d67)
    
    Change-Id: I18fedcd6921042ef2e945605466194b7b53fe2f7
    BUG: 1425556
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/16703
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 3 Shyamsundar 2017-02-27 15:30:12 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-devel/2017-February/052173.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 4 Shyamsundar 2017-03-06 17:46:49 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/