Bug 765486 (GLUSTER-3754)
Summary: | [glusterfs-3.2.5qa2] transport endpoint not connected error while running dbench | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | M S Vishwanath Bhat <vbhat> | ||||
Component: | rdma | Assignee: | Amar Tumballi <amarts> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | M S Vishwanath Bhat <vbhat> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | pre-release | CC: | gluster-bugs, mzywusko, rfortier, vijay, vraman | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.4.0 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-07-24 17:55:41 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 817967 | ||||||
Attachments: |
|
Description
M S Vishwanath Bhat
2011-10-23 08:15:33 UTC
There are a few xdr decoding errors. Can you please take a look in? I see that all the xdr failures are in 'symlink_cbk'. Want to understand if it is the same behavior with master branch. Do you still have access to the setup? Cornered the error to just one step: mkdir -p <254lenpath>/<254lenpath>/.../ #repeat the 254 to cross 2k limit The logs are: on server: [2011-10-27 21:43:04.138722] E [rpcsvc.c:476:rpcsvc_handle_rpc_call] 0-rpc-service: No vectored handler present [2011-10-27 21:43:04.138803] W [rpcsvc.c:1066:rpcsvc_error_reply] (-->/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x130) [0x7f78721c06ac] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_notify+0x181) [0x7f78721bac33] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x35f) [0x7f78721ba8ab]))) 0-: sending a RPC error reply [2011-10-27 21:43:04.139385] W [rpc-common.c:64:xdr_to_generic] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x211) [0x7fd6d6759eb5] (-->/usr/local/lib/glusterfs/3.2.5qa1/xlator/protocol/client.so(client3_1_lookup_cbk+0x13b) [0x7fd6d37fc92d] (-->/usr/local/lib/libgfxdr.so.0(xdr_to_lookup_rsp+0x3c) [0x7fd6d653cd02]))) 0-xdr: XDR decoding failed [2011-10-27 21:43:04.139411] E [client3_1-fops.c:2174:client3_1_lookup_cbk] 0-hosdu-client-1: error [2011-10-27 21:43:04.139428] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-hosdu-client-1: remote operation failed: Invalid argument [2011-10-27 21:43:04.139449] W [fuse-bridge.c:184:fuse_entry_cbk] 0-glusterfs-fuse: 685: LOOKUP() /1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111/1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111/1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111/1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111/1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111/1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 => -1 (Invalid argument) I suspect the issue is arising because of 2k inline buffer limit in RDMA xport. I guess we should increase it to 8k buffers to fix the issue. Also, saw this in dmesg of server. Client side glusterfs log says ping timeout of (42secs). Which caused the transport timeout. INFO: task iozone:5571 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. iozone D 0000000000000003 0 5571 23319 0x00000080 ffff880009d73cc8 0000000000000086 0000000000000000 ffffffffa0622c99 ffff880009d73c38 ffffffff81012979 ffff880009d73c78 ffffffff81098d09 ffff8800375c10b8 ffff880009d73fd8 000000000000f598 ffff8800375c10b8 Call Trace: [<ffffffffa0622c99>] ? rpc_run_task+0xd9/0x130 [sunrpc] [<ffffffff81012979>] ? read_tsc+0x9/0x20 [<ffffffff81098d09>] ? ktime_get_ts+0xa9/0xe0 [<ffffffff81098d09>] ? ktime_get_ts+0xa9/0xe0 [<ffffffff8110d3d0>] ? sync_page+0x0/0x50 [<ffffffff814dbad3>] io_schedule+0x73/0xc0 [<ffffffff8110d40d>] sync_page+0x3d/0x50 [<ffffffff814dc33f>] __wait_on_bit+0x5f/0x90 [<ffffffff8110d5c3>] wait_on_page_bit+0x73/0x80 [<ffffffff8108e1a0>] ? wake_bit_function+0x0/0x50 [<ffffffff81123375>] ? pagevec_lookup_tag+0x25/0x40 [<ffffffff8110d9db>] wait_on_page_writeback_range+0xfb/0x190 [<ffffffff8110dba8>] filemap_write_and_wait_range+0x78/0x90 [<ffffffff811a098e>] vfs_fsync_range+0x7e/0xe0 [<ffffffff811a0a5d>] vfs_fsync+0x1d/0x20 [<ffffffff811a0a9e>] do_fsync+0x3e/0x60 [<ffffffff811a0af0>] sys_fsync+0x10/0x20 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b nfs: server 10.1.10.24 not responding, timed out by reducing the dbench threads from 50 to 20, the issues went away. CHANGE: http://review.gluster.com/697 (Change-Id: I1b0b91eb7216f32a5748325d6dda37f3998302c7) merged in master by Vijay Bellur (vijay) dbench completed fine with tcp,rdma volume mounted via tcp. Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 126548 33.138 14992.630 Close 92136 3.588 13367.404 Rename 5275 88.705 15013.519 Unlink 26106 17.507 3048.435 Qpathinfo 114030 15.512 14954.688 Qfileinfo 19571 10.957 2362.818 Qfsinfo 20852 5.931 13366.600 Sfileinfo 10163 15.627 1841.571 Find 43740 55.253 14997.428 WriteX 61575 2.237 13355.511 ReadX 194303 1.804 14931.607 LockX 400 4.980 15.509 UnlockX 400 4.710 17.717 Flush 8790 156.435 13384.918 |