Description of problem: Glusterfs fuse client was crashing with the below back trace. ~~~~~~~~~ patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2017-03-22 22:31:32 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.9 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f45f3c3e1c2] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f45f3c6396d] /lib64/libc.so.6(+0x35670)[0x7f45f232a670] /usr/lib64/glusterfs/3.7.9/xlator/performance/io-cache.so(__ioc_page_wakeup+0x44)[0x7f45e525e5b4] /usr/lib64/glusterfs/3.7.9/xlator/performance/io-cache.so(ioc_inode_wakeup+0x164)[0x7f45e525ffa4] /usr/lib64/glusterfs/3.7.9/xlator/performance/io-cache.so(ioc_cache_validate_cbk+0x31b)[0x7f45e5257b2b] /usr/lib64/glusterfs/3.7.9/xlator/performance/read-ahead.so(ra_attr_cbk+0x11a)[0x7f45e566edfa] /lib64/libglusterfs.so.0(default_fstat_cbk+0x11a)[0x7f45f3c47ada] /usr/lib64/glusterfs/3.7.9/xlator/cluster/distribute.so(dht_file_attr_cbk+0x1c5)[0x7f45e5aea505] /usr/lib64/glusterfs/3.7.9/xlator/cluster/replicate.so(afr_fstat_cbk+0x131)[0x7f45e5d27de1] /usr/lib64/glusterfs/3.7.9/xlator/protocol/client.so(client3_3_fstat_cbk+0x44e)[0x7f45e5fa7f8e] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f45f3a0c990] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7f45f3a0cc4f] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f45f3a08793] /usr/lib64/glusterfs/3.7.9/rpc-transport/socket.so(+0x69b4)[0x7f45e86a19b4] /usr/lib64/glusterfs/3.7.9/rpc-transport/socket.so(+0x95f4)[0x7f45e86a45f4] /lib64/libglusterfs.so.0(+0x94c0a)[0x7f45f3cacc0a] /lib64/libpthread.so.0(+0x7dc5)[0x7f45f2aa6dc5] /lib64/libc.so.6(clone+0x6d)[0x7f45f23ebced] ~~~~~~~~~ * There was a lot of below messages in the client just before the crash. ~~~~~~~~~~ [2017-03-23 08:41:29.936098] W [MSGID: 108027] [afr-common.c:2250:afr_discover_done] 4-vCDN-replicate-2: no read subvols for / The message "W [MSGID: 108027] [afr-common.c:2250:afr_discover_done] 4-vCDN-replicate-2: no read subvols for /" repeated 90 times between [2017-03-23 08:41:29.936098] and [2017-03-23 08:43:28.210919] ~~~~~~~~~~ * These messages was caused by the meta-data split-brain with some directories including the brick "/" Version-Release number of selected component (if applicable): RHGS 3.1.3 glusterfs-3.7.9-12.el7.x86_64 How reproducible: A couple of times in the customer environment Actual results: glusterfs-fuseclient was crashing and mount point started giving transport end point not connected. Expected results: glusterfs-fuseclient should not crash Additional info: * Application coredump is collected. It gave the below BT (gdb) bt #0 0x00007f45e525e5b4 in __ioc_page_wakeup (page=0x7f43246e1500, page@entry=0x7f45f17d0d64, op_errno=0) at page.c:960 #1 0x00007f45e525ffa4 in ioc_inode_wakeup (frame=0x7f45e00396c8, frame@entry=0x7f45f17d0d64, ioc_inode=ioc_inode@entry=0x7f45e0e62160, stbuf=stbuf@entry=0x7f45e69cca10) at ioc-inode.c:119 #2 0x00007f45e5257b2b in ioc_cache_validate_cbk (frame=0x7f45f17d0d64, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=<optimized out>, stbuf=<optimized out>, xdata=0x0) at io-cache.c:402 #3 0x00007f45e566edfa in ra_attr_cbk (frame=0x7f45f17e22e0, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, buf=0x7f45e69cca10, xdata=0x0) at read-ahead.c:721 #4 0x00007f45f3c47ada in default_fstat_cbk (frame=0x7f45f17b7188, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, buf=0x7f45e69cca10, xdata=0x0) at defaults.c:1053 #5 0x00007f45e5aea505 in dht_file_attr_cbk (frame=0x7f45f17ba090, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, stbuf=<optimized out>, xdata=0x0) at dht-inode-read.c:214 #6 0x00007f45e5d27de1 in afr_fstat_cbk (frame=0x7f45f17562d8, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, buf=0x7f45e69cca10, xdata=0x0) at afr-inode-read.c:291 #7 0x00007f45e5fa7f8e in client3_3_fstat_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f45f17e1c28) at client-rpc-fops.c:1574 #8 0x00007f45f3a0c990 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f45e03547c0, pollin=pollin@entry=0x7f45e1033480) at rpc-clnt.c:764 #9 0x00007f45f3a0cc4f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f45e03547f0, event=<optimized out>, data=0x7f45e1033480) at rpc-clnt.c:905 #10 0x00007f45f3a08793 in rpc_transport_notify (this=<optimized out>, event=<optimized out>, data=<optimized out>) at rpc-transport.c:546 #11 0x00007f45e86a19b4 in socket_event_poll_in (this=0x7f45e0364440) at socket.c:2355 #12 0x00007f45e86a45f4 in socket_event_handler (fd=<optimized out>, idx=8, data=0x7f45e0364440, poll_in=1, poll_out=0, poll_err=0) at socket.c:2469 #13 0x00007f45f3cacc0a in event_dispatch_epoll_handler (event=0x7f45e69cce80, event_pool=0x7f45f507c350) at event-epoll.c:570 #14 event_dispatch_epoll_worker (data=0x7f45f50d2ff0) at event-epoll.c:678 #15 0x00007f45f2aa6dc5 in start_thread (arg=0x7f45e69cd700) at pthread_create.c:308 #16 0x00007f45f23ebced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 * It seems like the glusterfs.fuse was crashing in the below function. __ioc_page_wakeup (ioc_page_t *page, int32_t op_errno) 948 { 949 ioc_waitq_t *waitq = NULL, *trav = NULL; 950 call_frame_t *frame = NULL; 951 int32_t ret = -1; 952 953 GF_VALIDATE_OR_GOTO ("io-cache", page, out); 954 955 waitq = page->waitq; 956 page->waitq = NULL; 957 958 page->ready = 1; 959 960 gf_msg_trace (page->inode->table->xl->name, 0, 961 "page is %p && waitq = %p", page, waitq); 962 963 for (trav = waitq; trav; trav = trav->next) { 964 frame = trav->data; 965 ret = __ioc_frame_fill (page, frame, trav->pending_offset, 966 trav->pending_size, op_errno); 967 if (ret == -1) { 968 break; 969 } 970 }
upstream patch : https://review.gluster.org/17410
Doc text looks fine.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774