Description of problem: glusterfs crashes and the volume must be remounted Version-Release number of selected component (if applicable): glusterfs-fuse-3.7.4-2.el7.x86_64 How reproducible: It happens randomly but quite frequently under medium load. Steps to Reproduce: 1. Create a tow server replicated volume with 3 bricks for each server 2. Mount the volume with FUSE 3. Set performance.quick-read on Actual results: Fuse mount process crashes on the server under load. Still works in the other server Expected results: glusterfs should not crash Additional info: (gdb) bt #0 0x00007f44586825f6 in __memcpy_ssse3_back () from /lib64/libc.so.6 #1 0x00007f4447563bc4 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51 #2 qr_content_extract (xdata=xdata@entry=0x7f445a163774) at quick-read.c:278 #3 0x00007f4447563f94 in qr_lookup_cbk (frame=0x7f44579942c4, cookie=<optimized out>, this=0x7f4448016320, op_ret=0, op_errno=117, inode_ret=0x7f4444afd434, buf=0x7f444c0628f0, xdata=0x7f445a163774, postparent=0x7f444c062b20) at quick-read.c:422 #4 0x00007f444777095c in ioc_lookup_cbk (frame=0x7f44579a1dcc, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=0x7f4444afd434, stbuf=0x7f444c0628f0, xdata=0x7f445a163774, postparent=0x7f444c062b20) at io-cache.c:260 #5 0x00007f4447dc4f7f in dht_discover_complete ( this=this@entry=0x7f4448011220, discover_frame=discover_frame@entry=0x7f44579906f8) at dht-common.c:304 #6 0x00007f4447dc563a in dht_discover_cbk (frame=0x7f44579906f8, cookie=0x7f4457990fb4, this=0x7f4448011220, op_ret=<optimized out>, op_errno=117, inode=0x7f4444afd434, stbuf=0x7f4439b0c198, xattr=0x7f445a163774, postparent=0x7f4439b0c208) at dht-common.c:439 #7 0x00007f444c1a2bb7 in afr_discover_done (this=<optimized out>, frame=0x7f4457990fb4) at afr-common.c:2114 #8 afr_discover_cbk (frame=0x7f4457990fb4, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, buf=0x7f444ce08930, xdata=0x7f445a162e28, postparent=0x7f444ce089a0) at afr-common.c:2149 #9 0x00007f444c3f1437 in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f4457993e10) at client-rpc-fops.c:2978 #10 0x00007f4459c4eb10 in rpc_clnt_handle_reply ( clnt=clnt@entry=0x7f44480fd310, pollin=pollin@entry=0x7f4448a51fd0) at rpc-clnt.c:766 #11 0x00007f4459c4edcf in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f44480fd340, event=<optimized out>, data=0x7f4448a51fd0) at rpc-clnt.c:907 #12 0x00007f4459c4a903 in rpc_transport_notify ( this=this@entry=0x7f444810d010, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f4448a51fd0) at rpc-transport.c:544 #13 0x00007f444e8eb506 in socket_event_poll_in (this=this@entry=0x7f444810d010) at socket.c:2236 #14 0x00007f444e8ee3f4 in socket_event_handler (fd=fd@entry=17, idx=idx@entry=6, data=0x7f444810d010, poll_in=1, poll_out=0, poll_err=0) at socket.c:2349 #15 0x00007f4459ee17ba in event_dispatch_epoll_handler (event=0x7f444ce08e80, event_pool=0x7f445abf2330) at event-epoll.c:575 #16 event_dispatch_epoll_worker (data=0x7f445ac3aeb0) at event-epoll.c:678 #17 0x00007f4458ce8df5 in start_thread () from /lib64/libpthread.so.0 #18 0x00007f445862f1ad in clone () from /lib64/libc.so.6 ---- Volume Name: home_gfs Type: Distributed-Replicate Volume ID: fa5aa52a-8105-47f1-b1d6-f10db8a11330 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: castore:/glusterfs/home_gfs/brick1 Brick2: polluce:/glusterfs/home_gfs/brick1 Brick3: castore:/glusterfs/home_gfs/brick2 Brick4: polluce:/glusterfs/home_gfs/brick2 Brick5: castore:/glusterfs/home_gfs/brick3 Brick6: polluce:/glusterfs/home_gfs/brick3 Options Reconfigured: performance.quick-read: on nfs.ports-insecure: on diagnostics.client-log-level: ERROR diagnostics.brick-log-level: ERROR cluster.self-heal-daemon: enable nfs.disable: on server.allow-insecure: on client.bind-insecure: on network.ping-timeout: 5 It is mounted this way: castore:/home_gfs on /export/home/public type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
Could you provide more information on the type of load when the crash happens, basically what fops were running, how much files/directories were created. Also could you upload the core file.
The server is running our postfix+dovecot server so under load but not extremely heavy, just many read and write of small files. I will attach a few minutes of profile and the core dump
Created attachment 1073289 [details] volume profile
Unfortunately the core dump is 300MB and the limit seems to be 20MB. There is another way I can use to attach it?
You could compress and attach it. Is it a VM, can I access it to get the core file? Also what is the error signal with which the process crashed?
Please upload the logs from the server where the crash happened
This refers to one of the crashes pending frames: frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(READDIRP) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-09-10 14:17:24 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.4 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7fb8f13dbf82] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7fb8f13f840d] /lib64/libc.so.6(+0x35650)[0x7fb8efaca650] /lib64/libc.so.6(+0x1495f6)[0x7fb8efbde5f6] /usr/lib64/glusterfs/3.7.4/xlator/performance/quick-read.so(qr_content_extract+0x44)[0x7fb8e2c1fbc4] /usr/lib64/glusterfs/3.7.4/xlator/performance/quick-read.so(qr_lookup_cbk+0x154)[0x7fb8e2c1ff94] /usr/lib64/glusterfs/3.7.4/xlator/performance/io-cache.so(ioc_lookup_cbk+0x36c)[0x7fb8e2e2c95c] /usr/lib64/glusterfs/3.7.4/xlator/cluster/distribute.so(dht_discover_complete+0x17f)[0x7fb8e3480f7f] /usr/lib64/glusterfs/3.7.4/xlator/cluster/distribute.so(dht_discover_cbk+0x29a)[0x7fb8e348163a] /usr/lib64/glusterfs/3.7.4/xlator/cluster/replicate.so(afr_discover_cbk+0x3a7)[0x7fb8e36febb7] /usr/lib64/glusterfs/3.7.4/xlator/protocol/client.so(client3_3_lookup_cbk+0x707)[0x7fb8e394d437] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7fb8f11aab10] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7fb8f11aadcf] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fb8f11a6903] /usr/lib64/glusterfs/3.7.4/rpc-transport/socket.so(+0x9506)[0x7fb8e5e47506] /usr/lib64/glusterfs/3.7.4/rpc-transport/socket.so(+0xc3f4)[0x7fb8e5e4a3f4] /lib64/libglusterfs.so.0(+0x877ba)[0x7fb8f143d7ba] /lib64/libpthread.so.0(+0x7df5)[0x7fb8f0244df5] /lib64/libc.so.6(clone+0x6d)[0x7fb8efb8b1ad] ---------
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.