Bug 1483956

Summary: [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing rebalance
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: rpcAssignee: Milind Changire <mchangir>
Status: CLOSED ERRATA QA Contact: Rochelle <rallan>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: amukherj, rallan, rhinduja, rhs-bugs, sanandpa
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-42 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1484225 (view as bug list) Environment:
Last Closed: 2017-09-21 05:06:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1484225    
Bug Blocks: 1417151    

Description Rahul Hinduja 2017-08-22 11:32:02 UTC
Description of problem:
=======================

Post rebalance completion (remove-brick or add-brick) observed following info messages every 3 secs:

[root@dhcp37-64 ~]# tailf /var/log/glusterfs/glusterd.log
[2017-08-22 08:54:55.763095] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:54:58.763920] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:01.764697] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:04.765471] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:07.766176] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:10.766886] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now

Currently in about a day we have around 23k lines and it keeps increasing. Eventually this would leave the systems /var partition out of space. 

[root@dhcp37-64 ~]# grep -ri "EPOLLERR - disconnecting now" /var/log/glusterfs/glusterd.log | wc -l 
23263
[root@dhcp37-64 ~]# 

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.8.4-41.el7rhgs.x86_64


How reproducible:
=================

Always


Steps to Reproduce:
===================
1. Create 3x2 volume and write data to it
2. Remove brick start to make it 2x2
3. Once rebalance is completed, do commit. 
4. Monitor the glusterd log file

Actual results:
===============

EPOLLERR error message comes every 3 secs. 

Expected results:
=================

After the successful rebalance completion, a greaceful shutdown should not result in these info messages. 

Also If there is any error it should be marked " E ", instead of " I ". This is because, most of the log monitoring tool uses keywords to filter the errors messages.

Comment 3 Atin Mukherjee 2017-08-23 05:17:48 UTC
upstream patch : https://review.gluster.org/#/c/18093/

Comment 10 Rochelle 2017-08-28 11:53:26 UTC
Verified with build : glusterfs-3.8.4-42.el6rhs.x86_64

Not able to see the "EPOLLERR- disconnecting now" messages in the log.

Moving this bug to verified.

Comment 12 errata-xmlrpc 2017-09-21 05:06:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774