+++ This bug was initially created as a clone of Bug #1261399 +++ Description of problem: ----------------------- When client quorum is not met, fuse mount crashes Version-Release number of selected component (if applicable): ------------------------------------------------------------- glusterfs-3.8dev-0.825.git2e40a95.el7rhgs.x86_64 How reproducible: ----------------- Tried only once Steps to Reproduce: ------------------- 1. Setup 3 RHEL 7.1 Nodes(hypervisors), having 1 brick per node, and create 1X3 volume 2. Optimize the volume for virt-store 3. Enable sharding on the volume ( setting shard-block-size to default of 4MB ) 4. Create a Application VM which runs on node1 5. Stop all the incoming/outgoing traffic from node1 to/from node2,node3 using iptables rules Actual results: --------------- Fuse mount process crashed on all the 3 nodes Expected results: ----------------- Fuse mount should not crash --- Additional comment from SATHEESARAN on 2015-09-09 05:01:35 EDT --- back trace from one of the node ( node1 ) (gdb) bt #0 fuse_writev_cbk (frame=0x7f28943fdbf4, cookie=<optimized out>, this=0x7f289861cad0, op_ret=0, op_errno=30, stbuf=<optimized out>, postbuf=0x0, xdata=0x0) at fuse-bridge.c:2271 #1 0x00007f288e24bd64 in io_stats_writev_cbk (frame=0x7f289442b59c, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=30, prebuf=0x0, postbuf=0x0, xdata=0x0) at io-stats.c:1400 #2 0x00007f28968e6bd6 in default_writev_cbk (frame=0x7f289441a224, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=30, prebuf=0x0, postbuf=0x0, xdata=0x0) at defaults.c:1016 #3 0x00007f288e86ebdd in wb_writev_cbk (frame=0x7f289442c9c4, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=30, prebuf=0x0, postbuf=0x0, xdata=0x0) at write-behind.c:1255 #4 0x00007f288ea80843 in shard_writev_do_cbk (frame=frame@entry=0x7f28943ee204, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=-1, op_errno=op_errno@entry=30, prebuf=prebuf@entry=0x7f288d887da4, postbuf=postbuf@entry=0x7f288d887e14, xdata=xdata@entry=0x0) at shard.c:2958 #5 0x00007f288ecd4923 in dht_writev_cbk (frame=0x7f28943f67b8, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=30, prebuf=0x7f288d887da4, postbuf=0x7f288d887e14, xdata=0x0) at dht-inode-write.c:90 #6 0x00007f288ef18e13 in afr_writev_unwind (frame=0x7f28943f4c2c, this=<optimized out>) at afr-inode-write.c:197 #7 0x00007f288ef18e7d in afr_transaction_writev_unwind (frame=0x7f28943ef224, this=0x7f288800a620) at afr-inode-write.c:214 #8 0x00007f288ef21fcb in __afr_txn_write_done (frame=0x7f28943ef224, this=<optimized out>) at afr-transaction.c:81 #9 0x00007f288ef2529e in afr_unlock_common_cbk (frame=frame@entry=0x7f28943ef224, this=this@entry=0x7f288800a620, xdata=<optimized out>, op_errno=0, op_ret=<optimized out>, cookie=<optimized out>) at afr-lk-common.c:633 #10 0x00007f288ef25337 in afr_unlock_inodelk_cbk (frame=0x7f28943ef224, cookie=<optimized out>, this=0x7f288800a620, op_ret=<optimized out>, op_errno=0, xdata=<optimized out>) at afr-lk-common.c:674 #11 0x00007f288f173b1d in client3_3_finodelk_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f28943f77d8) at client-rpc-fops.c:1673 #12 0x00007f28966aea10 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f28880a6840, pollin=pollin@entry=0x7f28800292f0) at rpc-clnt.c:759 #13 0x00007f28966aeccf in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f28880a6870, event=<optimized out>, data=0x7f28800292f0) at rpc-clnt.c:900 #14 0x00007f28966aa813 in rpc_transport_notify (this=this@entry=0x7f28880b6530, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f28800292f0) at rpc-transport.c:539 #15 0x00007f289186f646 in socket_event_poll_in (this=this@entry=0x7f28880b6530) at socket.c:2231 #16 0x00007f28918722a4 in socket_event_handler (fd=fd@entry=14, idx=idx@entry=3, data=0x7f28880b6530, poll_in=1, poll_out=0, poll_err=0) at socket.c:2344 #17 0x00007f28969418ca in event_dispatch_epoll_handler (event=0x7f288d798e80, event_pool=0x7f289860ed10) at event-epoll.c:570 #18 event_dispatch_epoll_worker (data=0x7f28880445e0) at event-epoll.c:673 #19 0x00007f2895748df5 in start_thread () from /lib64/libpthread.so.0 #20 0x00007f289508f1ad in clone () from /lib64/libc.so.6 --- Additional comment from SATHEESARAN on 2015-09-09 05:06:07 EDT --- 1. Installation info - Test was done on the custom build ( glusterfs-3.8dev-0.825.git2e40a95.el7rhgs.x86_64 ) from mainline ( 3.8dev ) 2. Setup info : --------------- Hypervisor1 - rhs-client10.lab.eng.blr.redhat.com ( this was a SPM ) Hypervisor2 - rhs-client15.lab.eng.blr.redhat.com Hypervisor3 - rhs-client21.lab.eng.blr.redhat.com 3. Gluster volume info : ------------------------ [root@rhs-client10 ~]# gluster volume info Volume Name: vmstore Type: Replicate Volume ID: 96695606-ff65-4f20-921a-b94d16a62c3a Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: rhs-client10.lab.eng.blr.redhat.com:/rhs/brick1/b1 Brick2: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick1/b1 Brick3: rhs-client21.lab.eng.blr.redhat.com:/rhs/brick1/b1 Options Reconfigured: performance.readdir-ahead: on nfs.disable: off user.cifs: enable auth.allow: * performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on 4. Client side mount --------------------- rhs-client10.lab.eng.blr.redhat.com:vmstore fuse.glusterfs 1.9T 3.3G 1.8T 1% /rhev/data-center/mnt/glusterSD/rhs-client10.lab.eng.blr.redhat.com:vmstore --- Additional comment from Krutika Dhananjay on 2015-09-09 07:52:28 EDT --- Nice catch, sas! Just checked the core. Turns out sharding is (wrongly) returning a non-negative return status even when there is a failure, thereby causing FUSE-bridge to assume the fop succeeded and dereference the iatt (which is NULL) and crash.
REVIEW: http://review.gluster.org/12144 (features/shard: Do not return non-negative status on failure in writev) posted (#1) for review on release-3.7 by Krutika Dhananjay (kdhananj)
COMMIT: http://review.gluster.org/12144 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) ------ commit 46674c5d5caaa183f8ee99efb64ef268eded91ab Author: Krutika Dhananjay <kdhananj> Date: Wed Sep 9 17:25:14 2015 +0530 features/shard: Do not return non-negative status on failure in writev Backport of: http://review.gluster.org/#/c/12140/ Change-Id: I7c49a083894cead528901ebc0a88fcfa17e53da3 BUG: 1261715 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: http://review.gluster.org/12144 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-glusterfs-3.7.5, please open a new bug report. glusterfs-glusterfs-3.7.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://www.gluster.org/pipermail/gluster-users/2015-October/023968.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.5, please open a new bug report. glusterfs-3.7.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://www.gluster.org/pipermail/gluster-users/2015-October/023968.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user