Bug 972610

Summary: Excessive logging of rpc_clnt_reconfig and socket_connect_finish in glustershd,nfs,glusterd and client logs every second
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: glusterdAssignee: Nagaprasad Sathyanarayana <nsathyan>
Status: CLOSED EOL QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: nsathyan, rhs-bugs, smohan, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-03 17:16:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rahul Hinduja 2013-06-10 08:27:51 UTC
Description of problem:
=======================

Once the bricks are killed using kill -9 <pid> all the logs like glustershd,nfs,glusterd and client logs are continuous reporting that "trying to connect to port which is refused." every second.

log snippet from glustershd:
============================

[2013-06-10 17:51:28.286470] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-2: changing port to 49153 (from 0)
[2013-06-10 17:51:28.289287] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-2: connection to 10.70.36.35:49153 failed (Connection refused)
[2013-06-10 17:51:29.289324] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-4: changing port to 49154 (from 0)
[2013-06-10 17:51:29.292086] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-4: connection to 10.70.36.35:49154 failed (Connection refused)
[2013-06-10 17:51:30.292199] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-0: changing port to 49152 (from 0)
[2013-06-10 17:51:30.294953] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-0: connection to 10.70.36.35:49152 failed (Connection refused)
[2013-06-10 17:51:31.294998] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-2: changing port to 49153 (from 0)
[2013-06-10 17:51:31.297760] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-2: connection to 10.70.36.35:49153 failed (Connection refused)
[2013-06-10 17:51:32.297983] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-4: changing port to 49154 (from 0)
[2013-06-10 17:51:32.300815] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-4: connection to 10.70.36.35:49154 failed (Connection refused)
[2013-06-10 17:51:33.300872] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-0: changing port to 49152 (from 0)
[2013-06-10 17:51:33.303700] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-0: connection to 10.70.36.35:49152 failed (Connection refused)
[2013-06-10 17:51:34.303777] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-2: changing port to 49153 (from 0)
[2013-06-10 17:51:34.306597] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-2: connection to 10.70.36.35:49153 failed (Connection refused)
[2013-06-10 17:51:35.306487] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-4: changing port to 49154 (from 0)
[2013-06-10 17:51:35.309336] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-4: connection to 10.70.36.35:49154 failed (Connection refused)
[2013-06-10 17:51:36.309363] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-0: changing port to 49152 (from 0)
[2013-06-10 17:51:36.312162] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-0: connection to 10.70.36.35:49152 failed (Connection refused)
[2013-06-10 17:51:37.312324] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-2: changing port to 49153 (from 0)
[2013-06-10 17:51:37.315151] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-2: connection to 10.70.36.35:49153 failed (Connection refused)
[2013-06-10 17:51:38.315194] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-4: changing port to 49154 (from 0)
[2013-06-10 17:51:38.317989] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-4: connection to 10.70.36.35:49154 failed (Connection refused)
[2013-06-10 17:51:39.318032] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-0: changing port to 49152 (from 0)
[2013-06-10 17:51:39.320838] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-0: connection to 10.70.36.35:49152 failed (Connection refused)
[2013-06-10 17:51:40.320882] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-2: changing port to 49153 (from 0)
[2013-06-10 17:51:40.323705] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-2: connection to 10.70.36.35:49153 failed (Connection refused)
[2013-06-10 17:51:41.323648] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-4: changing port to 49154 (from 0)
[2013-06-10 17:51:41.326430] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-4: connection to 10.70.36.35:49154 failed (Connection refused)
[2013-06-10 17:51:42.326531] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-0: changing port to 49152 (from 0)
[2013-06-10 17:51:42.329354] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-0: connection to 10.70.36.35:49152 failed (Connection refused)
[2013-06-10 17:51:43.334639] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-2: changing port to 49153 (from 0)
[2013-06-10 17:51:43.337486] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-2: connection to 10.70.36.35:49153 failed (Connection refused)
[2013-06-10 17:51:44.337410] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-4: changing port to 49154 (from 0)
[2013-06-10 17:51:44.340204] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-4: connection to 10.70.36.35:49154 failed (Connection refused)
[2013-06-10 17:51:45.340254] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-0: changing port to 49152 (from 0)
[2013-06-10 17:51:45.343081] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-0: connection to 10.70.36.35:49152 failed (Connection refused)
[2013-06-10 17:51:46.348321] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-2: changing port to 49153 (from 0)
[2013-06-10 17:51:46.351139] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-2: connection to 10.70.36.35:49153 failed (Connection refused)
[2013-06-10 17:51:47.351230] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-4: changing port to 49154 (from 0)
[2013-06-10 17:51:47.354116] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-4: connection to 10.70.36.35:49154 failed (Connection refused)
[2013-06-10 17:51:48.353935] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-vol-dis-rep-client-0: changing port to 49152 (from 0)
[2013-06-10 17:51:48.356739] E [socket.c:2158:socket_connect_finish] 0-vol-dis-rep-client-0: connection to 10.70.36.35:49152 failed (Connection refused)


Log Snippet from glusterd logs:
===============================

/run/55ed469286f16b9546e7bf7c4fb0960e.socket failed, (Connection refused)
[2013-06-10 17:52:57.344248] E [socket.c:2790:socket_connect] 0-management: connection attempt on /var/run/08421e3828f08de5bb681291123cdf10.socket failed, (Connection refused)
[2013-06-10 17:52:57.344281] E [socket.c:2790:socket_connect] 0-management: connection attempt on /var/run/4220b7da8aec58b0de58eca496035f06.socket failed, (Connection refused)
[2013-06-10 17:53:00.344844] E [socket.c:2790:socket_connect] 0-management: connection attempt on /var/run/55ed469286f16b9546e7bf7c4fb0960e.socket failed, (Connection refused)
[2013-06-10 17:53:00.348169] E [socket.c:2790:socket_connect] 0-management: connection attempt on /var/run/08421e3828f08de5bb681291123cdf10.socket failed, (Connection refused)
[2013-06-10 17:53:00.348214] E [socket.c:2790:socket_connect] 0-management: connection attempt on /var/run/4220b7da8aec58b0de58eca496035f06.socket failed, (Connection refused)
[2013-06-10 17:53:03.348845] E [socket.c:2790:socket_connect] 0-management: connection attempt on /var/run/55ed469286f16b9546e7bf7c4fb0960e.socket failed, (Connection refused)
[2013-06-10 17:53:03.351805] E [socket.c:2790:socket_connect] 0-management: connection attempt on /var/run/08421e3828f08de5bb681291123cdf10.socket failed, (Connection refused)
[2013-06-10 17:53:03.351838] E [socket.c:2790:socket_connect] 0-management: connection attempt on /var/run/4220b7da8aec58b0de58eca496035f06.socket failed, (Connection refused)
[2013-06-10 17:53:06.352430] E [socket.c:2790:socket_connect] 0-management: connection attempt on /var/run/55ed469286f16b9546e7bf7c4fb0960e.socket failed, (Connection refused)
[2013-06-10 17:53:06.355653] E [socket.c:2790:socket_connect] 0-management: connection attempt on /var/run/08421e3828f08de5bb681291123cdf10.socket failed, (Connection refused)
[2013-06-10 17:53:06.355694] E [socket.c:2790:socket_connect] 0-management: connection attempt on /var/run/4220b7da8aec58b0de58eca496035f06.socket failed, (Connection refused)



Version-Release number of selected component (if applicable):
=============================================================

rpm -qa | grep gluster | grep 3.4.0
glusterfs-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.9rhs-1.el6rhs.x86_64



How reproducible:
=================
1/1


Steps Carried:
==============
1. Created gluster volume 6*2 from 4 servers
2. Mounted(Fuse and NFS) on client
3. Created IO's from fuse and nfs mount.
4. Killed all the bricks from server1 using "kill -9"
5. Powered down server 3

Expected Result:
================

Attempt to connect to ports and logging into the logs should be handled gracefully.

Comment 3 Pranith Kumar K 2013-07-24 13:50:17 UTC
The issue is that the same rpc transport is used from connecting to glusterd to get portmap and then connect to the brick. The flag connect_finish_log is used to guard against repetition of the log message. But because the connection to glusterd succeeds this variable is reset which leads to allowing the same error log to appear again.

Comment 4 Nagaprasad Sathyanarayana 2014-05-06 11:43:39 UTC
Dev ack to 3.0 RHS BZs

Comment 8 Vivek Agarwal 2015-12-03 17:16:57 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.