Bug 1380655

Summary:	Continuous errors getting in the mount log when the volume mount server glusterd is down.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Byreddy <bsrirama>
Component:	rpc	Assignee:	Mohit Agrawal <moagrawa>
Status:	CLOSED ERRATA	QA Contact:	Byreddy <bsrirama>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.2	CC:	amukherj, csaba, moagrawa, prasanna.kalever, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.2.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-6	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1388877 1394108 1394109 (view as bug list)		Environment:
Last Closed:	2017-03-23 06:06:54 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1388877
Bug Blocks:	1351528, 1394108, 1394109

Description Byreddy 2016-09-30 09:06:45 UTC

Description of problem:
=======================
when volume mount servers glusterd is down, getting the below continuous errors messages in the volume mount log for every 3 seconds.

<START>
[2016-09-30 08:45:54.917489] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:45:54.917542] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-09-30 08:45:57.924521] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:45:57.924585] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-09-30 08:46:00.931708] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:46:00.931781] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-09-30 08:46:03.938789] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:46:03.938857] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
<END>


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-2


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Have one or two nodes cluster
2. Create replica volume ( i used 7 x 2 = 14 )
3. Fuse mount the volume 
4. Stop glusterd in the node from where volume is mounted.
5. Check the volume mount log.



Actual results:
===============
getting continuous error messages for every 3 seconds.

Expected results:
=================
There should be some control on error throwing or some other solution.
3 seconds frequency will consume lot of log storage if volume mount servers is down for any known reasons.


Additional info:

Comment 2 Ravishankar N 2016-09-30 09:17:51 UTC

Changing component to core since this is not relevant to FUSE per se and the behaviour can be observed on gNFS mounts too.

Comment 3 Byreddy 2016-10-18 06:42:04 UTC

This issue is not there in the last GA build.

Comment 5 Atin Mukherjee 2016-10-26 04:36:14 UTC

Apologies Byreddy, I complete missed out comment 3, will be moving it back to 3.2.0 for further analysis and thanks for catching it!

Comment 6 Atin Mukherjee 2016-10-26 11:33:21 UTC

upstream mainline patch http://review.gluster.org/15732 posted for review.

Comment 7 Mohit Agrawal 2016-10-27 05:20:09 UTC

Hi,
 
 Messages are coming (mgmt_rpc_notify) continuously in this build because one check was removed before execute the code block in case of RPC_CLNT_DISCONNECT from this patch (http://review.gluster.org/#/c/13002/).
 
 To reduce the frequency of messages change gf_log to GF_LOG_OCCASIONALLY.

Regards
Mohit Agrawal

Comment 12 Byreddy 2016-12-08 06:07:07 UTC

Verified this BZ using the build glusterfs-3.8.4-7.

Fix is working good, Now  populating the num of error messages are less compared to earlier when vol file server is down.

[2016-12-08 05:54:55.846722] W [socket.c:590:__socket_rwv] 0-glusterfs: readv on 10.70.41.198:24007 failed (No data available)
[2016-12-08 05:54:55.846894] E [glusterfsd-mgmt.c:1924:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.41.198 (No data available)
[2016-12-08 05:54:55.846919] I [glusterfsd-mgmt.c:1942:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-12-08 05:55:07.740290] E [socket.c:2309:socket_connect_finish] 0-glusterfs: connection to 10.70.41.198:24007 failed (Connection refused)



[2016-12-08 05:57:10.035103] E [glusterfsd-mgmt.c:1924:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.41.198 (Transport endpoint is not connected)
[2016-12-08 05:57:10.035203] I [glusterfsd-mgmt.c:1942:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers


Moving to verified state.

Comment 14 errata-xmlrpc 2017-03-23 06:06:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html