Bug 1426034

Summary: Add logs to identify whether disconnects are voluntary or due to network problems
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Raghavendra G <rgowdapp>
Component: rpcAssignee: Milind Changire <mchangir>
Status: CLOSED ERRATA QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, asrivast, mchangir, rhs-bugs
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-19 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1426125 (view as bug list) Environment:
Last Closed: 2017-09-21 04:33:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1426125    
Bug Blocks: 1417147    

Description Raghavendra G 2017-02-23 05:30:18 UTC
Description of problem:

One of the common problems we encounter are frequent connects/disconnects. Frequent disconnects can be either:

1. voluntary where the process calls a shutdown (2)/close (2) on an otherwise healthy socket connection.
2. involuntary where we get a POLLERR event from network.

While debugging this class of issues, it would help if can identify whether a particular disconnect falls into which of the two above categories. We need to add enough log messages to help us classify.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Milind Changire 2017-02-23 11:37:43 UTC
mainline patch: https://review.gluster.org/16732
moving to POST

Comment 4 Atin Mukherjee 2017-03-24 10:06:53 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/101324/

Comment 8 Nag Pavan Chilakam 2017-07-28 10:51:22 UTC
on_qa validation:
moving to verified after discussion with the Assignee to confirm expected behavior
testverion:3.8.4-35
Talked with Assignee and confirmed the below:


killing brick with trace enabled on client and brick. I see below EPOLLERR messgage, which Assignee confirmed.

Checked for both replicate and EC volume
fuse log:

[2017-07-28 10:44:46.925685] D [socket.c:564:__socket_rwv] 0-rep-client-0: EOF on socket
[2017-07-28 10:44:46.926240] W [socket.c:595:__socket_rwv] 0-rep-client-0: readv on 10.70.35.14:49157 failed (No data available)
[2017-07-28 10:44:46.926257] D [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-07-28 10:44:46.926287] D [MSGID: 0] [client.c:2264:client_rpc_notify] 0-rep-client-0: got RPC_CLNT_DISCONNECT
[2017-07-28 10:44:46.926371] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-rep-client-0: disconnected from rep-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2017-07-28 10:44:46.928289] D [rpc-clnt-ping.c:93:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7fccb73781e2] (--> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fccb7142ccb] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fccb713f0ff] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fccb713fbe0] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccb713b9e3] ))))) 0-: 10.70.35.14:49157: ping timer event already removed

Comment 10 errata-xmlrpc 2017-09-21 04:33:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 11 errata-xmlrpc 2017-09-21 04:57:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774