Bug 1409189

Summary: Failed to set TCP_USER_TIMEOUT msgs seen in logs
Product: [Community] GlusterFS Reporter: Raghavendra G <rgowdapp>
Component: glusterdAssignee: Raghavendra G <rgowdapp>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: mainlineCC: amukherj, bugs, johnzzpcrystal, rgowdapp, sasundar
Target Milestone: ---Keywords: Triaged
Target Release: ---Flags: ykaul: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-4.1.3 (or higher) Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-29 03:18:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raghavendra G 2016-12-30 05:19:38 UTC
Description of problem:

[2016-12-30 05:05:51.916161] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:05:51.916210] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:05:54.920471] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:05:54.920542] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:05:57.924559] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:05:57.924607] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:06:00.928536] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:06:00.928582] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:06:03.934574] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:06:03.934635] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:06:06.940427] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:06:06.940476] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:06:09.944536] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:06:09.944582] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument

These messages were seen at multiple places. In one of the tests I had just did "halt -p" on node containing peer glusterd (there were just two nodes in gluster).

Similar messages were seen on an user setup too. Again cause is unknown.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 SATHEESARAN 2017-01-02 06:09:24 UTC
I had a chat regarding this issue with Niels and I am feeling bad that I haven't followed up. Just pasting that mail conversation, so that it could help

On Wed, Jul 22, 2015 at 11:49:56AM +0530, SATHEESARAN wrote:
> Hi Niels,
>
> I have observed warning messages in glusterd related to TCP_USER_TIMEOUT :
>
> <snip>
> [2015-07-22 11:38:41.979511] I [rpc-clnt.c:972:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2015-07-22 11:38:41.987058] W [socket.c:923:__socket_keepalive] 0-socket:
> failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument
> [2015-07-22 11:38:41.987099] E [socket.c:3018:socket_connect] 0-management:
> Failed to set keep-alive: Invalid argument
> </snip>
>
> Do you know what these messages mean ?


That means that __socket_keepalive() is called with a timeout (last
parameter) of -1. This value normally comes from a socket_private_t
structure and is set through "transport.tcp-user-timeout".

Most functions expect the timeout be an unsigned int, how it gets to a
-1 value is unclear to me.

Does this happen always, or only in some configuration/environment?

Thanks,
Niels

Comment 2 Gaurav Yadav 2017-01-24 06:08:34 UTC
 Raghavendra,

Could you please provide the steps/command in order to reproduce the issue.
So that I can debug and root cause the issue.

Comment 3 Gaurav Yadav 2017-02-07 10:49:45 UTC
Raghavendra,

Could you please provide the steps/command in order to reproduce the issue.
So that I can debug and root cause the issue

Comment 4 Zhou Zhengping 2017-05-10 01:40:59 UTC
Maybe it's the same probelm as the following :

https://review.gluster.org/#/c/14785/

Comment 5 Atin Mukherjee 2017-08-10 04:55:21 UTC
(In reply to Zhou Zhengping from comment #4)
> Maybe it's the same probelm as the following :
> 
> https://review.gluster.org/#/c/14785/

Yes, that seems to be the case, given the patch is merged now, moving this bug to MODIFIED.