Bug 1484885 - [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing rebalance
Summary: [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing re...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: 3.12
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Atin Mukherjee
QA Contact:
URL:
Whiteboard:
Depends On: 1484225
Blocks: glusterfs-3.12.0
TreeView+ depends on / blocked
 
Reported: 2017-08-24 13:16 UTC by Atin Mukherjee
Modified: 2017-09-05 17:40 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.12.0
Clone Of: 1484225
Environment:
Last Closed: 2017-09-05 17:40:02 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Atin Mukherjee 2017-08-24 13:17:30 UTC
Description of problem:
=======================

Post rebalance completion (remove-brick or add-brick) observed following info messages every 3 secs:

[root@dhcp37-64 ~]# tailf /var/log/glusterfs/glusterd.log
[2017-08-22 08:54:55.763095] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:54:58.763920] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:01.764697] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:04.765471] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:07.766176] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:10.766886] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now

Currently in about a day we have around 23k lines and it keeps increasing. Eventually this would leave the systems /var partition out of space. 

[root@dhcp37-64 ~]# grep -ri "EPOLLERR - disconnecting now" /var/log/glusterfs/glusterd.log | wc -l 
23263
[root@dhcp37-64 ~]# 

Version-Release number of selected component (if applicable):
=============================================================
mainline


How reproducible:
=================

Always


Steps to Reproduce:
===================
1. Create 3x2 volume and write data to it
2. Remove brick start to make it 2x2
3. Once rebalance is completed, do commit. 
4. Monitor the glusterd log file

Actual results:
===============

EPOLLERR error message comes every 3 secs.

Comment 2 Worker Ant 2017-08-24 13:18:27 UTC
REVIEW: https://review.gluster.org/18117 (glusterd: disable rpc_clnt_t after relalance process disconnection) posted (#1) for review on release-3.12 by Atin Mukherjee (amukherj)

Comment 3 Worker Ant 2017-08-25 19:00:54 UTC
COMMIT: https://review.gluster.org/18117 committed in release-3.12 by Shyamsundar Ranganathan (srangana) 
------
commit bace1dd564f401f904dac6b965299f77228e4b1d
Author: Milind Changire <mchangir>
Date:   Thu Aug 24 12:39:47 2017 +0530

    glusterd: disable rpc_clnt_t after relalance process disconnection
    
    Problem:
    glusterd continues to connect to rebalance process even after
    the socket connection has disconnected.
    
    Solution:
    rpc_clnt_disable() disables the rpc_clnt_t object and disarms
    all relevant timers and drops refs to the rpc_clnt_t object
    and the transport as well.
    
    >Reviewed-on: https://review.gluster.org/18114
    >Reviewed-by: MOHIT AGRAWAL <moagrawa>
    >Tested-by: Atin Mukherjee <amukherj>
    >Reviewed-by: Atin Mukherjee <amukherj>
    >Smoke: Gluster Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >(cherry picked from commit a894d44427649e99d4344a241dc2f9d584a9a691)
    
    Change-Id: I981d6f1cc0087037f1927062c2770a4d5026a619
    BUG: 1484885
    Signed-off-by: Milind Changire <mchangir>
    Reviewed-on: https://review.gluster.org/18117
    Tested-by: Atin Mukherjee <amukherj>
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 4 Shyamsundar 2017-09-05 17:40:02 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.