Bug 1483956 - [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing rebalance
Summary: [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing re...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rpc
Version: rhgs-3.3
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: RHGS 3.3.0
Assignee: Milind Changire
QA Contact: Rochelle
URL:
Whiteboard:
Depends On: 1484225
Blocks: 1417151
TreeView+ depends on / blocked
 
Reported: 2017-08-22 11:32 UTC by Rahul Hinduja
Modified: 2017-09-21 05:06 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8.4-42
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1484225 (view as bug list)
Environment:
Last Closed: 2017-09-21 05:06:51 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Rahul Hinduja 2017-08-22 11:32:02 UTC
Description of problem:
=======================

Post rebalance completion (remove-brick or add-brick) observed following info messages every 3 secs:

[root@dhcp37-64 ~]# tailf /var/log/glusterfs/glusterd.log
[2017-08-22 08:54:55.763095] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:54:58.763920] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:01.764697] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:04.765471] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:07.766176] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:10.766886] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now

Currently in about a day we have around 23k lines and it keeps increasing. Eventually this would leave the systems /var partition out of space. 

[root@dhcp37-64 ~]# grep -ri "EPOLLERR - disconnecting now" /var/log/glusterfs/glusterd.log | wc -l 
23263
[root@dhcp37-64 ~]# 

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.8.4-41.el7rhgs.x86_64


How reproducible:
=================

Always


Steps to Reproduce:
===================
1. Create 3x2 volume and write data to it
2. Remove brick start to make it 2x2
3. Once rebalance is completed, do commit. 
4. Monitor the glusterd log file

Actual results:
===============

EPOLLERR error message comes every 3 secs. 

Expected results:
=================

After the successful rebalance completion, a greaceful shutdown should not result in these info messages. 

Also If there is any error it should be marked " E ", instead of " I ". This is because, most of the log monitoring tool uses keywords to filter the errors messages.

Comment 3 Atin Mukherjee 2017-08-23 05:17:48 UTC
upstream patch : https://review.gluster.org/#/c/18093/

Comment 10 Rochelle 2017-08-28 11:53:26 UTC
Verified with build : glusterfs-3.8.4-42.el6rhs.x86_64

Not able to see the "EPOLLERR- disconnecting now" messages in the log.

Moving this bug to verified.

Comment 12 errata-xmlrpc 2017-09-21 05:06:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.