Created attachment 1387736 [details] brick volume file Description of problem: RDMA transport bricks crash Version-Release number of selected component (if applicable): glusterfs-client-xlators-3.12.4-1.el7.x86_64 glusterfs-api-3.12.4-1.el7.x86_64 glusterfs-rdma-3.12.4-1.el7.x86_64 glusterfs-libs-3.12.4-1.el7.x86_64 glusterfs-cli-3.12.4-1.el7.x86_64 glusterfs-server-3.12.4-1.el7.x86_64 glusterfs-fuse-3.12.4-1.el7.x86_64 glusterfs-3.12.4-1.el7.x86_64 How reproducible: We are experiencing crashes of glusterfsd on bricks of a replicated RDMA volume. It may be interesting, that the two replicated bricks failed with the same error several minutes one after another and are running after restart. This happens approximately every 5 or 6 days. /usr/lib64/glusterfs/3.12.4/rpc-transport/rdma.so(+0x467f) is always on top of the stack. Backtrace: pending frames: frame : type(0) op(12) frame : type(0) op(12) frame : type(0) op(29) frame : type(0) op(37) frame : type(0) op(29) frame : type(0) op(29) frame : type(0) op(29) frame : type(0) op(29) frame : type(0) op(29) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2018-01-29 11:58:11 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.4 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f3a13cc3500] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f3a13ccd434] /lib64/libc.so.6(+0x35270)[0x7f3a1232c270] /usr/lib64/glusterfs/3.12.4/rpc-transport/rdma.so(+0x467f)[0x7f39fe8e867f] /usr/lib64/glusterfs/3.12.4/rpc-transport/rdma.so(+0x48af)[0x7f39fe8e88af] /usr/lib64/glusterfs/3.12.4/rpc-transport/rdma.so(__gf_rdma_do_gf_rdma_write+0x7d)[0x7f39fe8ec84d] /usr/lib64/glusterfs/3.12.4/rpc-transport/rdma.so(__gf_rdma_send_reply_type_msg+0x184)[0x7f39fe8ecee4] /usr/lib64/glusterfs/3.12.4/rpc-transport/rdma.so(__gf_rdma_ioq_churn_reply+0x128)[0x7f39fe8ed418] /usr/lib64/glusterfs/3.12.4/rpc-transport/rdma.so(__gf_rdma_ioq_churn_entry+0x85)[0x7f39fe8ed665] /usr/lib64/glusterfs/3.12.4/rpc-transport/rdma.so(+0xa2e0)[0x7f39fe8ee2e0] /usr/lib64/glusterfs/3.12.4/rpc-transport/rdma.so(gf_rdma_submit_reply+0x96)[0x7f39fe8ee9b6] /lib64/libgfrpc.so.0(rpcsvc_transport_submit+0x82)[0x7f3a13a832b2] /lib64/libgfrpc.so.0(rpcsvc_submit_generic+0x180)[0x7f3a13a84f00] /usr/lib64/glusterfs/3.12.4/xlator/protocol/server.so(+0x91cc)[0x7f39fef0b1cc] /usr/lib64/glusterfs/3.12.4/xlator/protocol/server.so(+0x206f4)[0x7f39fef226f4] /usr/lib64/glusterfs/3.12.4/xlator/debug/io-stats.so(+0x153a3)[0x7f39ff37a3a3] /lib64/libglusterfs.so.0(default_readv_cbk+0x17b)[0x7f3a13d43deb] /usr/lib64/glusterfs/3.12.4/xlator/features/upcall.so(+0x6cd1)[0x7f3a043e2cd1] /usr/lib64/glusterfs/3.12.4/xlator/features/leases.so(+0x2a6b)[0x7f3a045f9a6b] /usr/lib64/glusterfs/3.12.4/xlator/features/locks.so(+0x1025e)[0x7f3a04c3325e] /usr/lib64/glusterfs/3.12.4/xlator/features/changetimerecorder.so(+0xdc14)[0x7f3a05b8bc14] /usr/lib64/glusterfs/3.12.4/xlator/storage/posix.so(+0xf3d4)[0x7f3a065d93d4] /lib64/libglusterfs.so.0(default_readv+0xe1)[0x7f3a13d40101] /usr/lib64/glusterfs/3.12.4/xlator/features/changetimerecorder.so(+0x8daf)[0x7f3a05b86daf] /lib64/libglusterfs.so.0(default_readv+0xe1)[0x7f3a13d40101] /usr/lib64/glusterfs/3.12.4/xlator/features/bitrot-stub.so(+0xd9b1)[0x7f3a050749b1] /lib64/libglusterfs.so.0(default_readv+0xe1)[0x7f3a13d40101] /usr/lib64/glusterfs/3.12.4/xlator/features/locks.so(+0x190ec)[0x7f3a04c3c0ec] /lib64/libglusterfs.so.0(default_readv+0xe1)[0x7f3a13d40101] /lib64/libglusterfs.so.0(default_readv+0xe1)[0x7f3a13d40101] /usr/lib64/glusterfs/3.12.4/xlator/features/leases.so(+0x6531)[0x7f3a045fd531] /usr/lib64/glusterfs/3.12.4/xlator/features/upcall.so(+0x101ba)[0x7f3a043ec1ba] /lib64/libglusterfs.so.0(default_readv_resume+0x1f3)[0x7f3a13d5a9c3] /lib64/libglusterfs.so.0(call_resume_wind+0x2da)[0x7f3a13ce78ca] /lib64/libglusterfs.so.0(call_resume+0x75)[0x7f3a13ce7df5] /usr/lib64/glusterfs/3.12.4/xlator/performance/io-threads.so(+0x4de4)[0x7f3a041d5de4] /lib64/libpthread.so.0(+0x7e25)[0x7f3a12b22e25] /lib64/libc.so.6(clone+0x6d)[0x7f3a123ef34d]
Related? https://bugzilla.redhat.com/show_bug.cgi?id=1525850
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.
Jiri, Apologies for the delay. Thanks for the report, but we are not able to look into the RDMA section actively, and are seriously considering from dropping it from active support. More on this @ https://lists.gluster.org/pipermail/gluster-devel/2018-July/054990.html > ‘RDMA’ transport support: > > Gluster started supporting RDMA while ib-verbs was still new, and very high-end infra around that time were using Infiniband. Engineers did work > with Mellanox, and got the technology into GlusterFS for better data migration, data copy. While current day kernels support very good speed with > IPoIB module itself, and there are no more bandwidth for experts in these area to maintain the feature, we recommend migrating over to TCP (IP > based) network for your volume. > > If you are successfully using RDMA transport, do get in touch with us to prioritize the migration plan for your volume. Plan is to work on this > after the release, so by version 6.0, we will have a cleaner transport code, which just needs to support one type.