Bug 1380655

Summary: Continuous errors getting in the mount log when the volume mount server glusterd is down.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Byreddy <bsrirama>
Component: rpcAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED ERRATA QA Contact: Byreddy <bsrirama>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, csaba, moagrawa, prasanna.kalever, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1388877 1394108 1394109 (view as bug list) Environment:
Last Closed: 2017-03-23 06:06:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1388877    
Bug Blocks: 1351528, 1394108, 1394109    

Description Byreddy 2016-09-30 09:06:45 UTC
Description of problem:
=======================
when volume mount servers glusterd is down, getting the below continuous errors messages in the volume mount log for every 3 seconds.

<START>
[2016-09-30 08:45:54.917489] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:45:54.917542] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-09-30 08:45:57.924521] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:45:57.924585] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-09-30 08:46:00.931708] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:46:00.931781] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-09-30 08:46:03.938789] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:46:03.938857] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
<END>


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-2


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Have one or two nodes cluster
2. Create replica volume ( i used 7 x 2 = 14 )
3. Fuse mount the volume 
4. Stop glusterd in the node from where volume is mounted.
5. Check the volume mount log.



Actual results:
===============
getting continuous error messages for every 3 seconds.

Expected results:
=================
There should be some control on error throwing or some other solution.
3 seconds frequency will consume lot of log storage if volume mount servers is down for any known reasons.


Additional info:

Comment 2 Ravishankar N 2016-09-30 09:17:51 UTC
Changing component to core since this is not relevant to FUSE per se and the behaviour can be observed on gNFS mounts too.

Comment 3 Byreddy 2016-10-18 06:42:04 UTC
This issue is not there in the last GA build.

Comment 5 Atin Mukherjee 2016-10-26 04:36:14 UTC
Apologies Byreddy, I complete missed out comment 3, will be moving it back to 3.2.0 for further analysis and thanks for catching it!

Comment 6 Atin Mukherjee 2016-10-26 11:33:21 UTC
upstream mainline patch http://review.gluster.org/15732 posted for review.

Comment 7 Mohit Agrawal 2016-10-27 05:20:09 UTC
Hi,
 
 Messages are coming (mgmt_rpc_notify) continuously in this build because one check was removed before execute the code block in case of RPC_CLNT_DISCONNECT from this patch (http://review.gluster.org/#/c/13002/).
 
 To reduce the frequency of messages change gf_log to GF_LOG_OCCASIONALLY.

Regards
Mohit Agrawal

Comment 12 Byreddy 2016-12-08 06:07:07 UTC
Verified this BZ using the build glusterfs-3.8.4-7.

Fix is working good, Now  populating the num of error messages are less compared to earlier when vol file server is down.

[2016-12-08 05:54:55.846722] W [socket.c:590:__socket_rwv] 0-glusterfs: readv on 10.70.41.198:24007 failed (No data available)
[2016-12-08 05:54:55.846894] E [glusterfsd-mgmt.c:1924:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.41.198 (No data available)
[2016-12-08 05:54:55.846919] I [glusterfsd-mgmt.c:1942:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-12-08 05:55:07.740290] E [socket.c:2309:socket_connect_finish] 0-glusterfs: connection to 10.70.41.198:24007 failed (Connection refused)



[2016-12-08 05:57:10.035103] E [glusterfsd-mgmt.c:1924:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.41.198 (Transport endpoint is not connected)
[2016-12-08 05:57:10.035203] I [glusterfsd-mgmt.c:1942:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers


Moving to verified state.

Comment 14 errata-xmlrpc 2017-03-23 06:06:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html