Bug 1460639
Summary: | [Stress] : IO errored out with ENOTCONN. | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> | |
Component: | rpc | Assignee: | Milind Changire <mchangir> | |
Status: | CLOSED ERRATA | QA Contact: | Rajesh Madaka <rmadaka> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.3 | CC: | amukherj, bturner, mchangir, msaini, nbalacha, nchilaka, rgowdapp, rhinduja, rhs-bugs, sheggodu, storage-qa-internal, vdas | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | RHGS 3.4.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | rebase | |||
Fixed In Version: | glusterfs-3.12.2-1 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1461092 (view as bug list) | Environment: | ||
Last Closed: | 2018-09-04 06:32:21 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1461092, 1503134 |
Description
Ambarish
2017-06-12 09:49:46 UTC
I see this in the mount logs : <snip> [2017-06-12 08:04:35.948610] W [rpc-clnt.c:1694:rpc_clnt_submit] 0-butcher-client-6: failed to submit rpc-request (XID: 0xfedb7 Program: GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (butcher-client-6) [2017-06-12 08:04:35.950750] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-butcher-client-6: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-06-12 08:04:35.951130] W [fuse-bridge.c:2312:fuse_writev_cbk] 0-glusterfs-fuse: 6491583: WRITE => -1 gfid=f6e34aa2-4050-4cd4-896e-712660b2ef42 fd=0x7fd7e92e3630 (Transport endpoint is not connected) [2017-06-12 08:04:36.026522] W [fuse-bridge.c:1291:fuse_err_cbk] 0-glusterfs-fuse: 6491584: FLUSH() ERR => -1 (Transport endpoint is not connected) [2017-06-12 08:04:36.026944] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-butcher-client-6: Connected to butcher-client-6, attached to remote volume '/bricks4/A1'. </snip> And these errors (from rpc?) in the brick logs at the same time : </snip> [2017-06-12 08:04:35.950650] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2017-06-12 08:04:35.950658] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2017-06-12 08:04:35.950697] E [server-helpers.c:395:server_alloc_frame] (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) [0x7efdf6f418c5] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x29c84) [0x7efde26ecc84] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0xe084) [0x7efde26d1084] ) 0-server: invalid argument: client [Invalid argument] [2017-06-12 08:04:35.950719] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2017-06-12 08:04:35.950878] E [server-helpers.c:395:server_alloc_frame] (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) [0x7efdf6f418c5] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x29c84) [0x7efde26ecc84] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0xe084) [0x7efde26d1084] ) 0-server: invalid argument: client [Invalid argument] [2017-06-12 08:04:35.950909] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2017-06-12 08:04:35.951028] E [server-helpers.c:395:server_alloc_frame] (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) [0x7efdf6f418c5] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x29c84) [0x7efde26ecc84] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0xe084) [0x7efde26d1084] ) 0-server: invalid argument: client [Invalid argument] [2017-06-12 08:04:35.951054] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2017-06-12 08:04:35.951185] E [server-helpers.c:395:server_alloc_frame] (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) [0x7efdf6f418c5] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x29c84) [0x7efde26ecc84] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0xe084) [0x7efde26d1084] ) 0-server: invalid argument: client [Invalid argument] [2017-06-12 08:04:35.951233] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2017-06-12 08:04:35.951354] E [server-helpers.c:395:server_alloc_frame] (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) [0x7efdf6f418c5] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x29c84) [0x7efde26ecc84] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0xe084) [0x7efde26d1084] ) 0-server: invalid argument: client [Invalid argument] [2017-06-12 08:04:35.951382] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2017-06-12 08:04:35.951569] I [addr.c:182:gf_auth] 0-/bricks4/A1: allowed = "*", received addr = "192.168.97.182" [2017-06-12 08:04:35.951611] I [MSGID: 115029] [server-handshake.c:712:server_setvolume] 0-butcher-server: accepted client from gqac016.sbu.lab.eng.bos.redhat.com-6077-2017/06/12-07:16:31:520108-butcher-client-6-0-1 (version: 3.8.4) [2017-06-12 08:04:35.989077] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2017-06-12 08:04:35.989320] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2017-06-12 08:04:36.014154] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2017-06-12 08:04:36.014455] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2017-06-12 08:04:36.014605] W [socket.c:595:__socket_rwv] 0-tcp.butcher-server: writev on 192.168.97.182:1016 failed (Broken pipe) [2017-06-12 08:04:36.015270] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2017-06-12 08:04:36.015701] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2017-06-12 08:04:36.015772] I [MSGID: 115036] [server.c:561:server_rpc_notify] 0-butcher-server: disconnecting connection from gqac016.sbu.lab.eng.bos.redhat.com-6077-2017/06/12-07:16:31:520108-butcher-client-6-0-0 [2017-06-12 08:04:36.015807] I [socket.c:3662:socket_submit_reply] 0-tcp.butcher-server: not connected (priv->connected = -1) [2017-06-12 08:04:36.015836] E [rpcsvc.c:1333:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xfedb0, Program: GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (tcp.butcher-server) </snip> There is nothing to indicate that this is a dht issue. From the logs, it appears to be an rpc/transport issue. Moving this to the rpc component for further analysis. Setting back the needinfo on Ambarish based on comment 8 A patch merged upstream [1] is expected to bring down ping-timeout expiries during high load on bricks. We are planning to get this patch backported to rhgs-3.4.0. With [1] client disconnect will be fixed, but not sure how the hung xfs will affect I/O. @Milind, can you work with Ambarish to move this bug to relevant component once #17105 is taken in? [1] https://review.gluster.org/17105 Followed the steps mentioned in above desc. Created plain distributed volume and mounted(fuse mount) on 4 different clients, then started running I/O's on all 4 clients using below tools. -> dd I/O tool on one client -> untar on one client -> small files on two clients. i didn't find any dd I/O's failing. Verified Bug with below version: glusterfs-3.12.2-8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607 |