Bug 1409189

Summary:	Failed to set TCP_USER_TIMEOUT msgs seen in logs
Product:	[Community] GlusterFS	Reporter:	Raghavendra G <rgowdapp>
Component:	glusterd	Assignee:	Raghavendra G <rgowdapp>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	low	Docs Contact:
Priority:	low
Version:	mainline	CC:	amukherj, bugs, johnzzpcrystal, rgowdapp, sasundar
Target Milestone:	---	Keywords:	Triaged
Target Release:	---	Flags:	ykaul: needinfo+
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-4.1.3 (or higher)	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-08-29 03:18:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Raghavendra G 2016-12-30 05:19:38 UTC

Description of problem:

[2016-12-30 05:05:51.916161] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:05:51.916210] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:05:54.920471] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:05:54.920542] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:05:57.924559] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:05:57.924607] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:06:00.928536] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:06:00.928582] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:06:03.934574] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:06:03.934635] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:06:06.940427] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:06:06.940476] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:06:09.944536] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:06:09.944582] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument

These messages were seen at multiple places. In one of the tests I had just did "halt -p" on node containing peer glusterd (there were just two nodes in gluster).

Similar messages were seen on an user setup too. Again cause is unknown.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 SATHEESARAN 2017-01-02 06:09:24 UTC

I had a chat regarding this issue with Niels and I am feeling bad that I haven't followed up. Just pasting that mail conversation, so that it could help

On Wed, Jul 22, 2015 at 11:49:56AM +0530, SATHEESARAN wrote:
> Hi Niels,
>
> I have observed warning messages in glusterd related to TCP_USER_TIMEOUT :
>
> <snip>
> [2015-07-22 11:38:41.979511] I [rpc-clnt.c:972:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2015-07-22 11:38:41.987058] W [socket.c:923:__socket_keepalive] 0-socket:
> failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument
> [2015-07-22 11:38:41.987099] E [socket.c:3018:socket_connect] 0-management:
> Failed to set keep-alive: Invalid argument
> </snip>
>
> Do you know what these messages mean ?

That means that __socket_keepalive() is called with a timeout (last
parameter) of -1. This value normally comes from a socket_private_t
structure and is set through "transport.tcp-user-timeout".

Most functions expect the timeout be an unsigned int, how it gets to a
-1 value is unclear to me.

Does this happen always, or only in some configuration/environment?

Thanks,
Niels

Comment 2 Gaurav Yadav 2017-01-24 06:08:34 UTC

 Raghavendra,

Could you please provide the steps/command in order to reproduce the issue.
So that I can debug and root cause the issue.

Comment 3 Gaurav Yadav 2017-02-07 10:49:45 UTC

Raghavendra,

Could you please provide the steps/command in order to reproduce the issue.
So that I can debug and root cause the issue

Comment 4 Zhou Zhengping 2017-05-10 01:40:59 UTC

Maybe it's the same probelm as the following :

https://review.gluster.org/#/c/14785/

Comment 5 Atin Mukherjee 2017-08-10 04:55:21 UTC

(In reply to Zhou Zhengping from comment #4)
> Maybe it's the same probelm as the following :
> 
> https://review.gluster.org/#/c/14785/

Yes, that seems to be the case, given the patch is merged now, moving this bug to MODIFIED.