Description of problem: ----------------------- When client quorum is not met, fuse mount crashes Version-Release number of selected component (if applicable): ------------------------------------------------------------- glusterfs-3.8dev-0.825.git2e40a95.el7rhgs.x86_64 How reproducible: ----------------- Tried only once Steps to Reproduce: ------------------- 1. Setup 3 RHEL 7.1 Nodes(hypervisors), having 1 brick per node, and create 1X3 volume 2. Optimize the volume for virt-store 3. Enable sharding on the volume ( setting shard-block-size to default of 4MB ) 4. Create a Application VM which runs on node1 5. Stop all the incoming/outgoing traffic from node1 to/from node2,node3 using iptables rules Actual results: --------------- Fuse mount process crashed on all the 3 nodes Expected results: ----------------- Fuse mount should not crash
back trace from one of the node ( node1 ) (gdb) bt #0 fuse_writev_cbk (frame=0x7f28943fdbf4, cookie=<optimized out>, this=0x7f289861cad0, op_ret=0, op_errno=30, stbuf=<optimized out>, postbuf=0x0, xdata=0x0) at fuse-bridge.c:2271 #1 0x00007f288e24bd64 in io_stats_writev_cbk (frame=0x7f289442b59c, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=30, prebuf=0x0, postbuf=0x0, xdata=0x0) at io-stats.c:1400 #2 0x00007f28968e6bd6 in default_writev_cbk (frame=0x7f289441a224, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=30, prebuf=0x0, postbuf=0x0, xdata=0x0) at defaults.c:1016 #3 0x00007f288e86ebdd in wb_writev_cbk (frame=0x7f289442c9c4, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=30, prebuf=0x0, postbuf=0x0, xdata=0x0) at write-behind.c:1255 #4 0x00007f288ea80843 in shard_writev_do_cbk (frame=frame@entry=0x7f28943ee204, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=-1, op_errno=op_errno@entry=30, prebuf=prebuf@entry=0x7f288d887da4, postbuf=postbuf@entry=0x7f288d887e14, xdata=xdata@entry=0x0) at shard.c:2958 #5 0x00007f288ecd4923 in dht_writev_cbk (frame=0x7f28943f67b8, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=30, prebuf=0x7f288d887da4, postbuf=0x7f288d887e14, xdata=0x0) at dht-inode-write.c:90 #6 0x00007f288ef18e13 in afr_writev_unwind (frame=0x7f28943f4c2c, this=<optimized out>) at afr-inode-write.c:197 #7 0x00007f288ef18e7d in afr_transaction_writev_unwind (frame=0x7f28943ef224, this=0x7f288800a620) at afr-inode-write.c:214 #8 0x00007f288ef21fcb in __afr_txn_write_done (frame=0x7f28943ef224, this=<optimized out>) at afr-transaction.c:81 #9 0x00007f288ef2529e in afr_unlock_common_cbk (frame=frame@entry=0x7f28943ef224, this=this@entry=0x7f288800a620, xdata=<optimized out>, op_errno=0, op_ret=<optimized out>, cookie=<optimized out>) at afr-lk-common.c:633 #10 0x00007f288ef25337 in afr_unlock_inodelk_cbk (frame=0x7f28943ef224, cookie=<optimized out>, this=0x7f288800a620, op_ret=<optimized out>, op_errno=0, xdata=<optimized out>) at afr-lk-common.c:674 #11 0x00007f288f173b1d in client3_3_finodelk_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f28943f77d8) at client-rpc-fops.c:1673 #12 0x00007f28966aea10 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f28880a6840, pollin=pollin@entry=0x7f28800292f0) at rpc-clnt.c:759 #13 0x00007f28966aeccf in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f28880a6870, event=<optimized out>, data=0x7f28800292f0) at rpc-clnt.c:900 #14 0x00007f28966aa813 in rpc_transport_notify (this=this@entry=0x7f28880b6530, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f28800292f0) at rpc-transport.c:539 #15 0x00007f289186f646 in socket_event_poll_in (this=this@entry=0x7f28880b6530) at socket.c:2231 #16 0x00007f28918722a4 in socket_event_handler (fd=fd@entry=14, idx=idx@entry=3, data=0x7f28880b6530, poll_in=1, poll_out=0, poll_err=0) at socket.c:2344 #17 0x00007f28969418ca in event_dispatch_epoll_handler (event=0x7f288d798e80, event_pool=0x7f289860ed10) at event-epoll.c:570 #18 event_dispatch_epoll_worker (data=0x7f28880445e0) at event-epoll.c:673 #19 0x00007f2895748df5 in start_thread () from /lib64/libpthread.so.0 #20 0x00007f289508f1ad in clone () from /lib64/libc.so.6
Created attachment 1071648 [details] coredump from Hypervisor1
Created attachment 1071649 [details] coredump from Hypervisor2
Created attachment 1071650 [details] coredump from Hypervisor3
Created attachment 1071653 [details] sosreport from Hypervisor1
Created attachment 1071654 [details] sosreport from Hypervisor2
Created attachment 1071656 [details] sosreport from Hypervisor3
Nice catch, sas! Just checked the core. Turns out sharding is (wrongly) returning a non-negative return status even when there is a failure, thereby causing FUSE-bridge to assume the fop succeeded and dereference the iatt (which is NULL) and crash.
REVIEW: http://review.gluster.org/12140 (features/shard: Do not return non-negative status on failure in writev) posted (#1) for review on master by Krutika Dhananjay (kdhananj)
COMMIT: http://review.gluster.org/12140 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit b1f851709c30505cac2b63bc49234ae818559d2d Author: Krutika Dhananjay <kdhananj> Date: Wed Sep 9 17:25:14 2015 +0530 features/shard: Do not return non-negative status on failure in writev Change-Id: I5f65c49484e44a05bb7df53c73869f89ad3392e0 BUG: 1261399 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: http://review.gluster.org/12140 Tested-by: NetBSD Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user