1290734 – [GlusterD]: GlusterD log is filled with error messages - " Failed to aggregate response from node/brick"

Bug 1290734 - [GlusterD]: GlusterD log is filled with error messages - " Failed to aggregate response from node/brick"

Summary: [GlusterD]: GlusterD log is filled with error messages - " Failed to aggregat...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Atin Mukherjee
QA Contact:
Docs Contact:
URL:
Whiteboard:	glusterd
Depends On:	1290653
Blocks:	1310999
TreeView+	depends on / blocked

Reported:	2015-12-11 09:44 UTC by Atin Mukherjee
Modified:	2016-06-16 13:50 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.8rc2
Clone Of:	1290653
Clones:	1310999 (view as bug list)
Environment:
Last Closed:	2016-06-16 13:50:01 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Atin Mukherjee 2015-12-11 09:44:42 UTC

+++ This bug was initially created as a clone of Bug #1290653 +++

Description of problem:
=======================
Created a VM in RHEVM Env using gluster volume as a storage and observed below  errors in the  glusterd logs

<<<<<<<<<<<<<<<<<<<<GlusterD Log>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

[2015-12-11 04:41:40.000010] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:41:40.004306] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs
[2015-12-11 04:42:40.657420] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:42:40.660675] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs
[2015-12-11 04:43:37.847553] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs
[2015-12-11 04:43:41.302213] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:44:41.960021] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
The message "I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs" repeated 3 times between [2015-12-11 04:43:37.847553] and [2015-12-11 04:44:41.963542]
[2015-12-11 04:45:42.634852] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:45:42.640099] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs
[2015-12-11 04:46:43.297719] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:46:43.301371] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs
[2015-12-11 04:47:43.956339] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:47:43.959903] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs
[2015-12-11 04:51:46.542435] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2015-12-11 04:51:46.546767] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_brs

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<End>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Version-Release number of selected component (if applicable):
mainline


How reproducible:
=================
Everytime

Steps to Reproduce:
===================
1. Have two nodes
2. Create a Distributed-Replica volume and do fuse mount on client
5. gluster volume status all tasks

Actual results:
===============
Errors in glusterd logs 
[2015-12-11 04:51:46.542435] E [MSGID: 106108] [glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick


Expected results:
=================
Above mentioned error should not come in glusterd log.


Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-12-10 23:40:31 EST ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Atin Mukherjee on 2015-12-11 00:34:31 EST ---

I don't think this has anything to do with RHEVM setup. I remember seeing this log in some set up and on further analysis I figured out that we have inadequate logs for this path to get to the actual reason of the failure, we need to improve the logging part here and you can expect a patch to be coming soon in upstream. However, if I try to set up a two node cluster and create a volume and run volume status I don't see this log. 

Do you have a reproducer for this?

Comment 1 Vijay Bellur 2015-12-11 09:46:33 UTC

REVIEW: http://review.gluster.org/12950 (glusterd: correct ret code in glusterd_volume_status_copy_to_op_ctx_dict) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 2 Vijay Bellur 2015-12-14 05:53:49 UTC

REVIEW: http://review.gluster.org/12950 (glusterd: correct ret code in glusterd_volume_status_copy_to_op_ctx_dict) posted (#2) for review on master by Atin Mukherjee (amukherj)

Comment 3 Vijay Bellur 2015-12-25 05:04:43 UTC

COMMIT: http://review.gluster.org/12950 committed in master by Atin Mukherjee (amukherj) 
------
commit 88bf33555371ae01dd297aecf8666d7121309b80
Author: Atin Mukherjee <amukherj>
Date:   Fri Dec 11 15:15:53 2015 +0530

    glusterd: correct ret code in glusterd_volume_status_copy_to_op_ctx_dict
    
    This patch is to supress the error log of Failed to aggregate rsp_dict where the
    above function returns a non zero ret which is not required
    
    Change-Id: If331980291bd369690257215333cea175e2042ec
    BUG: 1290734
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/12950
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Gaurav Kumar Garg <ggarg>

Comment 4 Niels de Vos 2016-06-16 13:50:01 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.