Bug 1426034 - Add logs to identify whether disconnects are voluntary or due to network problems
Summary: Add logs to identify whether disconnects are voluntary or due to network prob...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rpc
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.3.0
Assignee: Milind Changire
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On: 1426125
Blocks: 1417147
TreeView+ depends on / blocked
 
Reported: 2017-02-23 05:30 UTC by Raghavendra G
Modified: 2017-09-21 04:57 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.8.4-19
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1426125 (view as bug list)
Environment:
Last Closed: 2017-09-21 04:33:25 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Raghavendra G 2017-02-23 05:30:18 UTC
Description of problem:

One of the common problems we encounter are frequent connects/disconnects. Frequent disconnects can be either:

1. voluntary where the process calls a shutdown (2)/close (2) on an otherwise healthy socket connection.
2. involuntary where we get a POLLERR event from network.

While debugging this class of issues, it would help if can identify whether a particular disconnect falls into which of the two above categories. We need to add enough log messages to help us classify.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Milind Changire 2017-02-23 11:37:43 UTC
mainline patch: https://review.gluster.org/16732
moving to POST

Comment 4 Atin Mukherjee 2017-03-24 10:06:53 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/101324/

Comment 8 Nag Pavan Chilakam 2017-07-28 10:51:22 UTC
on_qa validation:
moving to verified after discussion with the Assignee to confirm expected behavior
testverion:3.8.4-35
Talked with Assignee and confirmed the below:


killing brick with trace enabled on client and brick. I see below EPOLLERR messgage, which Assignee confirmed.

Checked for both replicate and EC volume
fuse log:

[2017-07-28 10:44:46.925685] D [socket.c:564:__socket_rwv] 0-rep-client-0: EOF on socket
[2017-07-28 10:44:46.926240] W [socket.c:595:__socket_rwv] 0-rep-client-0: readv on 10.70.35.14:49157 failed (No data available)
[2017-07-28 10:44:46.926257] D [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-07-28 10:44:46.926287] D [MSGID: 0] [client.c:2264:client_rpc_notify] 0-rep-client-0: got RPC_CLNT_DISCONNECT
[2017-07-28 10:44:46.926371] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-rep-client-0: disconnected from rep-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2017-07-28 10:44:46.928289] D [rpc-clnt-ping.c:93:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7fccb73781e2] (--> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fccb7142ccb] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fccb713f0ff] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fccb713fbe0] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccb713b9e3] ))))) 0-: 10.70.35.14:49157: ping timer event already removed

Comment 10 errata-xmlrpc 2017-09-21 04:33:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 11 errata-xmlrpc 2017-09-21 04:57:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.