1483956 – [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing rebalance

Bug 1483956 - [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing rebalance

Summary: [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing re...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rpc
Sub Component:
Version:	rhgs-3.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Milind Changire
QA Contact:	Rochelle
Docs Contact:
URL:
Whiteboard:
Depends On:	1484225
Blocks:	1417151
TreeView+	depends on / blocked

Reported:	2017-08-22 11:32 UTC by Rahul Hinduja
Modified:	2017-09-21 05:06 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.8.4-42
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1484225 (view as bug list)
Environment:
Last Closed:	2017-09-21 05:06:51 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2774	0	normal	SHIPPED_LIVE	glusterfs bug fix and enhancement update	2017-09-21 08:16:29 UTC

Description Rahul Hinduja 2017-08-22 11:32:02 UTC

Description of problem:
=======================

Post rebalance completion (remove-brick or add-brick) observed following info messages every 3 secs:

[root@dhcp37-64 ~]# tailf /var/log/glusterfs/glusterd.log
[2017-08-22 08:54:55.763095] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:54:58.763920] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:01.764697] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:04.765471] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:07.766176] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-22 08:55:10.766886] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now

Currently in about a day we have around 23k lines and it keeps increasing. Eventually this would leave the systems /var partition out of space. 

[root@dhcp37-64 ~]# grep -ri "EPOLLERR - disconnecting now" /var/log/glusterfs/glusterd.log | wc -l 
23263
[root@dhcp37-64 ~]# 

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.8.4-41.el7rhgs.x86_64


How reproducible:
=================

Always


Steps to Reproduce:
===================
1. Create 3x2 volume and write data to it
2. Remove brick start to make it 2x2
3. Once rebalance is completed, do commit. 
4. Monitor the glusterd log file

Actual results:
===============

EPOLLERR error message comes every 3 secs. 

Expected results:
=================

After the successful rebalance completion, a greaceful shutdown should not result in these info messages. 

Also If there is any error it should be marked " E ", instead of " I ". This is because, most of the log monitoring tool uses keywords to filter the errors messages.

Comment 3 Atin Mukherjee 2017-08-23 05:17:48 UTC

upstream patch : https://review.gluster.org/#/c/18093/

Comment 10 Rochelle 2017-08-28 11:53:26 UTC

Verified with build : glusterfs-3.8.4-42.el6rhs.x86_64

Not able to see the "EPOLLERR- disconnecting now" messages in the log.

Moving this bug to verified.

Comment 12 errata-xmlrpc 2017-09-21 05:06:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Note You need to log in before you can comment on or make changes to this bug.