Description of problem: ======================= when volume mount servers glusterd is down, getting the below continuous errors messages in the volume mount log for every 3 seconds. <START> [2016-09-30 08:45:54.917489] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected) [2016-09-30 08:45:54.917542] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers [2016-09-30 08:45:57.924521] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected) [2016-09-30 08:45:57.924585] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers [2016-09-30 08:46:00.931708] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected) [2016-09-30 08:46:00.931781] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers [2016-09-30 08:46:03.938789] E [glusterfsd-mgmt.c:1922:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.43.190 (Transport endpoint is not connected) [2016-09-30 08:46:03.938857] I [glusterfsd-mgmt.c:1939:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers <END> Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.8.4-2 How reproducible: ================= Always Steps to Reproduce: =================== 1. Have one or two nodes cluster 2. Create replica volume ( i used 7 x 2 = 14 ) 3. Fuse mount the volume 4. Stop glusterd in the node from where volume is mounted. 5. Check the volume mount log. Actual results: =============== getting continuous error messages for every 3 seconds. Expected results: ================= There should be some control on error throwing or some other solution. 3 seconds frequency will consume lot of log storage if volume mount servers is down for any known reasons. Additional info:
Changing component to core since this is not relevant to FUSE per se and the behaviour can be observed on gNFS mounts too.
This issue is not there in the last GA build.
Apologies Byreddy, I complete missed out comment 3, will be moving it back to 3.2.0 for further analysis and thanks for catching it!
upstream mainline patch http://review.gluster.org/15732 posted for review.
Hi, Messages are coming (mgmt_rpc_notify) continuously in this build because one check was removed before execute the code block in case of RPC_CLNT_DISCONNECT from this patch (http://review.gluster.org/#/c/13002/). To reduce the frequency of messages change gf_log to GF_LOG_OCCASIONALLY. Regards Mohit Agrawal
Verified this BZ using the build glusterfs-3.8.4-7. Fix is working good, Now populating the num of error messages are less compared to earlier when vol file server is down. [2016-12-08 05:54:55.846722] W [socket.c:590:__socket_rwv] 0-glusterfs: readv on 10.70.41.198:24007 failed (No data available) [2016-12-08 05:54:55.846894] E [glusterfsd-mgmt.c:1924:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.41.198 (No data available) [2016-12-08 05:54:55.846919] I [glusterfsd-mgmt.c:1942:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers [2016-12-08 05:55:07.740290] E [socket.c:2309:socket_connect_finish] 0-glusterfs: connection to 10.70.41.198:24007 failed (Connection refused) [2016-12-08 05:57:10.035103] E [glusterfsd-mgmt.c:1924:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.70.41.198 (Transport endpoint is not connected) [2016-12-08 05:57:10.035203] I [glusterfsd-mgmt.c:1942:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers Moving to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html