Bug 1484885

Summary: [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing rebalance
Product: [Community] GlusterFS Reporter: Atin Mukherjee <amukherj>
Component: rpcAssignee: Atin Mukherjee <amukherj>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.12CC: amukherj, bugs, mchangir, nbalacha, rhinduja, rhs-bugs, srangana
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.12.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1484225 Environment:
Last Closed: 2017-09-05 17:40:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1484225    
Bug Blocks: 1473826    

Comment 1 Atin Mukherjee 2017-08-24 13:17:30 UTC
Description of problem:
=======================

Post rebalance completion (remove-brick or add-brick) observed following info messages every 3 secs:

[root@dhcp37-64 ~]# tailf /var/log/glusterfs/glusterd.log
[2017-08-22 08:54:55.763095] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:54:58.763920] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:01.764697] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:04.765471] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:07.766176] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:10.766886] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now

Currently in about a day we have around 23k lines and it keeps increasing. Eventually this would leave the systems /var partition out of space. 

[root@dhcp37-64 ~]# grep -ri "EPOLLERR - disconnecting now" /var/log/glusterfs/glusterd.log | wc -l 
23263
[root@dhcp37-64 ~]# 

Version-Release number of selected component (if applicable):
=============================================================
mainline


How reproducible:
=================

Always


Steps to Reproduce:
===================
1. Create 3x2 volume and write data to it
2. Remove brick start to make it 2x2
3. Once rebalance is completed, do commit. 
4. Monitor the glusterd log file

Actual results:
===============

EPOLLERR error message comes every 3 secs.

Comment 2 Worker Ant 2017-08-24 13:18:27 UTC
REVIEW: https://review.gluster.org/18117 (glusterd: disable rpc_clnt_t after relalance process disconnection) posted (#1) for review on release-3.12 by Atin Mukherjee (amukherj)

Comment 3 Worker Ant 2017-08-25 19:00:54 UTC
COMMIT: https://review.gluster.org/18117 committed in release-3.12 by Shyamsundar Ranganathan (srangana) 
------
commit bace1dd564f401f904dac6b965299f77228e4b1d
Author: Milind Changire <mchangir>
Date:   Thu Aug 24 12:39:47 2017 +0530

    glusterd: disable rpc_clnt_t after relalance process disconnection
    
    Problem:
    glusterd continues to connect to rebalance process even after
    the socket connection has disconnected.
    
    Solution:
    rpc_clnt_disable() disables the rpc_clnt_t object and disarms
    all relevant timers and drops refs to the rpc_clnt_t object
    and the transport as well.
    
    >Reviewed-on: https://review.gluster.org/18114
    >Reviewed-by: MOHIT AGRAWAL <moagrawa>
    >Tested-by: Atin Mukherjee <amukherj>
    >Reviewed-by: Atin Mukherjee <amukherj>
    >Smoke: Gluster Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >(cherry picked from commit a894d44427649e99d4344a241dc2f9d584a9a691)
    
    Change-Id: I981d6f1cc0087037f1927062c2770a4d5026a619
    BUG: 1484885
    Signed-off-by: Milind Changire <mchangir>
    Reviewed-on: https://review.gluster.org/18117
    Tested-by: Atin Mukherjee <amukherj>
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 4 Shyamsundar 2017-09-05 17:40:02 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/