Bug 1388877

Summary: Continuous errors getting in the mount log when the volume mount server glusterd is down.
Product: [Community] GlusterFS Reporter: Mohit Agrawal <moagrawa>
Component: rpcAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: amukherj, bsrirama, bugs, csaba, moagrawa, prasanna.kalever, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.10.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1380655 Environment:
Last Closed: 2017-03-06 17:31:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1380655, 1394108, 1394109    

Description Mohit Agrawal 2016-10-26 10:52:10 UTC
+++ This bug was initially created as a clone of Bug #1380655 +++

Description of problem:
=======================
when volume mount servers glusterd is down, getting the below continuous errors messages in the volume mount log for every 3 seconds.

<START>
[2016-09-30 08:45:54.917489] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:45:54.917542] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-09-30 08:45:57.924521] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:45:57.924585] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-09-30 08:46:00.931708] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:46:00.931781] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2016-09-30 08:46:03.938789] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected)
[2016-09-30 08:46:03.938857] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
<END>


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-2


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Have one or two nodes cluster
2. Create replica volume ( i used 7 x 2 = 14 )
3. Fuse mount the volume 
4. Stop glusterd in the node from where volume is mounted.
5. Check the volume mount log.



Actual results:
===============
getting continuous error messages for every 3 seconds.

Expected results:
=================
There should be some control on error throwing or some other solution.
3 seconds frequency will consume lot of log storage if volume mount servers is down for any known reasons.


Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-09-30 05:06:52 EDT ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.2.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Ravishankar N on 2016-09-30 05:17:51 EDT ---

Changing component to core since this is not relevant to FUSE per se and the behaviour can be observed on gNFS mounts too.

--- Additional comment from Byreddy on 2016-10-18 02:42:04 EDT ---

This issue is not there in the last GA build.

--- Additional comment from Byreddy on 2016-10-26 00:02:25 EDT ---

@Atin, Any reason why we moved this bug out of 3.2.0 ?

As per the Commen3, this issue newly introduced in the 3.2.0 build and this issue will consume lot of volume mount log storage if vol file server is down for any known reasons.

--- Additional comment from Atin Mukherjee on 2016-10-26 00:36:14 EDT ---

Apologies Byreddy, I complete missed out comment 3, will be moving it back to 3.2.0 for further analysis and thanks for catching it!

Comment 1 Mohit Agrawal 2016-10-26 11:01:41 UTC
Hi,
 
 Messages are coming (mgmt_rpc_notify) continuously in this build because one check was removed before execute the code block in case of RPC_CLNT_DISCONNECT from this patch (http://review.gluster.org/#/c/13002/).
 
 To reduce the frequency of messages change gf_log to GF_LOG_OCCASIONALLY.

Regards
Mohit Agrawal

Comment 2 Worker Ant 2016-10-26 11:09:11 UTC
REVIEW: http://review.gluster.org/15732 (glusterd: Continuous errors are getting in mount logs while glusterd is down) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 3 Worker Ant 2016-10-27 07:25:53 UTC
REVIEW: http://review.gluster.org/15732 (glusterfsd: Continuous errors are getting in mount logs while glusterd is down) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 4 Worker Ant 2016-11-10 14:45:22 UTC
COMMIT: http://review.gluster.org/15732 committed in master by Jeff Darcy (jdarcy) 
------
commit 7874ed245bcc80658547992205f8396f4dd3c76a
Author: Mohit Agrawal <moagrawa>
Date:   Wed Oct 26 16:31:58 2016 +0530

    glusterfsd: Continuous errors are getting in mount logs while glusterd is down
    
    Problem: when glusterd is down, getting the continuous mgmt_rpc_notify errors
             messages in the volume mount log for every 3 seconds,it will
             consume disk space.
    
    Solution: To reduce the frequency of error messages use GF_LOG_OCCASIONALLY.
    
    BUG: 1388877
    Change-Id: I6cf24c6ddd9ab380afd058bc0ecd556d664332b1
    Signed-off-by: Mohit Agrawal <moagrawa>
    Reviewed-on: http://review.gluster.org/15732
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra Talur <rtalur>
    Reviewed-by: Raghavendra G <rgowdapp>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 5 Shyamsundar 2017-03-06 17:31:48 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/