1290653 – [GlusterD]: GlusterD log is filled with error messages - " Failed to aggregate response from node/brick"

Bug 1290653 - [GlusterD]: GlusterD log is filled with error messages - " Failed to aggregate response from node/brick"

Summary: [GlusterD]: GlusterD log is filled with error messages - " Failed to aggregat...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Atin Mukherjee
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:	glusterd
Depends On:
Blocks:	1268895 1290734 1299184 1310999
TreeView+	depends on / blocked

Reported:	2015-12-11 04:40 UTC by Byreddy
Modified:	2016-06-30 06:23 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.7.9-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1290734 (view as bug list)
Environment:
Last Closed:	2016-06-23 04:59:02 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Byreddy 2015-12-11 04:40:26 UTC

Description of problem:
=======================
Created a VM in RHEVM Env using gluster volume as a storage and observed below  errors in the  glusterd logs

<<<<<<<<<<<<<<<<<<<<GlusterD Log>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

[2015-12-11 04:41:40.000010] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:41:40.004306] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs
[2015-12-11 04:42:40.657420] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:42:40.660675] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs
[2015-12-11 04:43:37.847553] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs
[2015-12-11 04:43:41.302213] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:44:41.960021] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
The message "I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs" repeated 3 times between [2015-12-11 04:43:37.847553] and [2015-12-11 04:44:41.963542]
[2015-12-11 04:45:42.634852] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:45:42.640099] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs
[2015-12-11 04:46:43.297719] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:46:43.301371] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs
[2015-12-11 04:47:43.956339] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:47:43.959903] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs
[2015-12-11 04:51:46.542435] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:51:46.546767] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<End>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Version-Release number of selected component (if applicable):
glusterfs-3.7.5-10


How reproducible:
=================
One time observed in 3.1.2.


Steps to Reproduce:
===================
1. Have two RHGS nodes with version  - glusterfs-3.7.5-10
2. Add both the nodes to RHEVM
3. Create a Distributed-Replica volume and do fuse mount on client
4. Create a VM  using the fuse mount point as storage
5. Check the glusterd logs in both RHGS nodes..

Actual results:
===============
Errors in glusterd logs 
[2015-12-11 04:51:46.542435] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick


Expected results:
=================
Above mentioned error should not come in glusterd log.


Additional info:

Comment 2 Atin Mukherjee 2015-12-11 05:34:31 UTC

I don't think this has anything to do with RHEVM setup. I remember seeing this log in some set up and on further analysis I figured out that we have inadequate logs for this path to get to the actual reason of the failure, we need to improve the logging part here and you can expect a patch to be coming soon in upstream. However, if I try to set up a two node cluster and create a volume and run volume status I don't see this log. 

Do you have a reproducer for this?

Comment 3 SATHEESARAN 2015-12-11 06:13:03 UTC

I have seen this issue also in RHGS 3.1.1 ( glusterfs-3.7.1-16.el7rhgs ) too

Comment 4 Byreddy 2015-12-11 06:39:29 UTC

(In reply to Atin Mukherjee from comment #2)
> I don't think this has anything to do with RHEVM setup. I remember seeing
> this log in some set up and on further analysis I figured out that we have
> inadequate logs for this path to get to the actual reason of the failure, we
> need to improve the logging part here and you can expect a patch to be
> coming soon in upstream. However, if I try to set up a two node cluster and
> create a volume and run volume status I don't see this log. 
> 
> Do you have a reproducer for this?

Atin, 

You are right this is not happening with RHEVM only, just with below steps we can consistently reproduce the issue.
1. Have two node cluster
2. Create a Sample (Distributed volume) volume using both the nodes and start it
3. Issue "gluster volume status all tasks" command on one of the node
4. Check the glusterd logs.

Comment 5 Atin Mukherjee 2015-12-11 09:50:09 UTC

An upstream patch http://review.gluster.org/#/c/12950/ is posted for review.

Comment 6 Nicolas Ecarnot 2015-12-18 09:04:09 UTC

Hi,

I'm witnessing the same repeated "Failed to aggregate response
from  node/brick" messages in a replica-3 centos 7.2 nodes with 3.7.6-1 gluster, approx. every 10 seconds.

Comment 7 SATHEESARAN 2015-12-18 14:32:26 UTC

(In reply to Nicolas Ecarnot from comment #6)
> Hi,
> 
> I'm witnessing the same repeated "Failed to aggregate response
> from  node/brick" messages in a replica-3 centos 7.2 nodes with 3.7.6-1
> gluster, approx. every 10 seconds.

Nicolas,

FYI
There was already a patch sent upstream master which is in "needs-code-review" state.
And that change was tracked as part of bug - https://bugzilla.redhat.com/show_bug.cgi?id=1290734 and hope the issue will be fixed in glusterfs-3.7.7

This bug is to track the issue for the product - "Red Hat Gluster Storage" 

Thanks

Comment 9 Atin Mukherjee 2016-01-25 05:36:01 UTC

Looks good to me

Comment 11 Atin Mukherjee 2016-03-22 12:09:02 UTC

The fix is now available in rhgs-3.1.3 branch, hence moving the state to Modified.

Comment 13 Byreddy 2016-04-04 05:19:44 UTC

Verified this bug using the build "glusterfs-3.7.9-1".

Checked glusterd log after executing the "volume status all tasks", No error message like "Failed to aggregate response from  node/brick" in the log.


Moving to verified state with above details.

Comment 15 errata-xmlrpc 2016-06-23 04:59:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.