Bug 1409189 - Failed to set TCP_USER_TIMEOUT msgs seen in logs
Summary: Failed to set TCP_USER_TIMEOUT msgs seen in logs
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-30 05:19 UTC by Raghavendra G
Modified: 2019-12-31 07:17 UTC (History)
5 users (show)

Fixed In Version: glusterfs-4.1.3 (or higher)
Clone Of:
Environment:
Last Closed: 2018-08-29 03:18:44 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:
ykaul: needinfo+


Attachments (Terms of Use)

Description Raghavendra G 2016-12-30 05:19:38 UTC
Description of problem:

[2016-12-30 05:05:51.916161] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:05:51.916210] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:05:54.920471] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:05:54.920542] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:05:57.924559] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:05:57.924607] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:06:00.928536] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:06:00.928582] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:06:03.934574] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:06:03.934635] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:06:06.940427] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:06:06.940476] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-12-30 05:06:09.944536] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument
[2016-12-30 05:06:09.944582] E [socket.c:3091:socket_connect] 0-management: Failed to set keep-alive: Invalid argument

These messages were seen at multiple places. In one of the tests I had just did "halt -p" on node containing peer glusterd (there were just two nodes in gluster).

Similar messages were seen on an user setup too. Again cause is unknown.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 SATHEESARAN 2017-01-02 06:09:24 UTC
I had a chat regarding this issue with Niels and I am feeling bad that I haven't followed up. Just pasting that mail conversation, so that it could help

On Wed, Jul 22, 2015 at 11:49:56AM +0530, SATHEESARAN wrote:
> Hi Niels,
>
> I have observed warning messages in glusterd related to TCP_USER_TIMEOUT :
>
> <snip>
> [2015-07-22 11:38:41.979511] I [rpc-clnt.c:972:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2015-07-22 11:38:41.987058] W [socket.c:923:__socket_keepalive] 0-socket:
> failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument
> [2015-07-22 11:38:41.987099] E [socket.c:3018:socket_connect] 0-management:
> Failed to set keep-alive: Invalid argument
> </snip>
>
> Do you know what these messages mean ?


That means that __socket_keepalive() is called with a timeout (last
parameter) of -1. This value normally comes from a socket_private_t
structure and is set through "transport.tcp-user-timeout".

Most functions expect the timeout be an unsigned int, how it gets to a
-1 value is unclear to me.

Does this happen always, or only in some configuration/environment?

Thanks,
Niels

Comment 2 Gaurav Yadav 2017-01-24 06:08:34 UTC
 Raghavendra,

Could you please provide the steps/command in order to reproduce the issue.
So that I can debug and root cause the issue.

Comment 3 Gaurav Yadav 2017-02-07 10:49:45 UTC
Raghavendra,

Could you please provide the steps/command in order to reproduce the issue.
So that I can debug and root cause the issue

Comment 4 Zhou Zhengping 2017-05-10 01:40:59 UTC
Maybe it's the same probelm as the following :

https://review.gluster.org/#/c/14785/

Comment 5 Atin Mukherjee 2017-08-10 04:55:21 UTC
(In reply to Zhou Zhengping from comment #4)
> Maybe it's the same probelm as the following :
> 
> https://review.gluster.org/#/c/14785/

Yes, that seems to be the case, given the patch is merged now, moving this bug to MODIFIED.


Note You need to log in before you can comment on or make changes to this bug.