1380655 – Continuous errors getting in the mount log when the volume mount server glusterd is down.

Bug 1380655 - Continuous errors getting in the mount log when the volume mount server glusterd is down.

Summary: Continuous errors getting in the mount log when the volume mount server glust...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rpc
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Mohit Agrawal
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:
Depends On:	1388877
Blocks:	1351528 1394108 1394109
TreeView+	depends on / blocked

Reported:	2016-09-30 09:06 UTC by Byreddy
Modified:	2017-03-23 06:06 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.8.4-6
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1388877 1394108 1394109 (view as bug list)
Environment:
Last Closed:	2017-03-23 06:06:54 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Byreddy 2016-09-30 09:06:45 UTC

Description of problem:
=======================
when volume mount servers glusterd is down, getting the below continuous errors messages in the volume mount log for every 3 seconds.

<START>
[2016-09-30 08:45:54.917489] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:45:54.917542] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-09-30 08:45:57.924521] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:45:57.924585] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-09-30 08:46:00.931708] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:46:00.931781] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-09-30 08:46:03.938789] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:46:03.938857] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
<END>


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-2


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Have one or two nodes cluster
2. Create replica volume ( i used 7 x 2 = 14 )
3. Fuse mount the volume 
4. Stop glusterd in the node from where volume is mounted.
5. Check the volume mount log.



Actual results:
===============
getting continuous error messages for every 3 seconds.

Expected results:
=================
There should be some control on error throwing or some other solution.
3 seconds frequency will consume lot of log storage if volume mount servers is down for any known reasons.


Additional info:

Comment 2 Ravishankar N 2016-09-30 09:17:51 UTC

Changing component to core since this is not relevant to FUSE per se and the behaviour can be observed on gNFS mounts too.

Comment 3 Byreddy 2016-10-18 06:42:04 UTC

This issue is not there in the last GA build.

Comment 5 Atin Mukherjee 2016-10-26 04:36:14 UTC

Apologies Byreddy, I complete missed out comment 3, will be moving it back to 3.2.0 for further analysis and thanks for catching it!

Comment 6 Atin Mukherjee 2016-10-26 11:33:21 UTC

upstream mainline patch http://review.gluster.org/15732 posted for review.

Comment 7 Mohit Agrawal 2016-10-27 05:20:09 UTC

Hi,
 
 Messages are coming (mgmt_rpc_notify) continuously in this build because one check was removed before execute the code block in case of RPC_CLNT_DISCONNECT from this patch (http://review.gluster.org/#/c/13002/).
 
 To reduce the frequency of messages change gf_log to GF_LOG_OCCASIONALLY.

Regards
Mohit Agrawal

Comment 12 Byreddy 2016-12-08 06:07:07 UTC

Verified this BZ using the build glusterfs-3.8.4-7.

Fix is working good, Now  populating the num of error messages are less compared to earlier when vol file server is down.

[2016-12-08 05:54:55.846722] W [socket.c:590:__socket_rwv] 0-glusterfs: readv on 10.70.41.198:24007 failed (No data available)
[2016-12-08 05:54:55.846894] E [glusterfsd-mgmt.c:1924:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.41.198 (No data available)
[2016-12-08 05:54:55.846919] I [glusterfsd-mgmt.c:1942:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-12-08 05:55:07.740290] E [socket.c:2309:socket_connect_finish] 0-glusterfs: connection to 10.70.41.198:24007 failed (Connection refused)



[2016-12-08 05:57:10.035103] E [glusterfsd-mgmt.c:1924:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.41.198 (Transport endpoint is not connected)
[2016-12-08 05:57:10.035203] I [glusterfsd-mgmt.c:1942:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers


Moving to verified state.

Comment 14 errata-xmlrpc 2017-03-23 06:06:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.