1261399 – [HC] Fuse mount crashes, when client-quorum is not met

Bug 1261399 - [HC] Fuse mount crashes, when client-quorum is not met

Summary: [HC] Fuse mount crashes, when client-quorum is not met

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	sharding
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Krutika Dhananjay
QA Contact:	bugs@gluster.org
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1261715
TreeView+	depends on / blocked

Reported:	2015-09-09 09:00 UTC by SATHEESARAN
Modified:	2016-06-16 13:35 UTC (History)
CC List:	3 users (show)
Fixed In Version:	glusterfs-3.8rc2
Clone Of:
Clones:	1261715 (view as bug list)
Environment:	RHEVM 3.5.4 RHEL 7.1 as hypervisor and gluster node (converged setup) sharding enabled
Last Closed:	2016-06-16 13:35:52 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
coredump from Hypervisor1 (1.78 MB, application/x-gzip) 2015-09-09 09:21 UTC, SATHEESARAN	no flags	Details
coredump from Hypervisor2 (1.46 MB, application/x-gzip) 2015-09-09 09:22 UTC, SATHEESARAN	no flags	Details
coredump from Hypervisor3 (1.34 MB, application/x-gzip) 2015-09-09 09:22 UTC, SATHEESARAN	no flags	Details
sosreport from Hypervisor1 (10.47 MB, application/x-xz) 2015-09-09 09:24 UTC, SATHEESARAN	no flags	Details
sosreport from Hypervisor2 (9.11 MB, application/x-xz) 2015-09-09 09:26 UTC, SATHEESARAN	no flags	Details
sosreport from Hypervisor3 (11.42 MB, application/x-xz) 2015-09-09 09:28 UTC, SATHEESARAN	no flags	Details
View All

Description SATHEESARAN 2015-09-09 09:00:27 UTC

Description of problem:
-----------------------
When client quorum is not met, fuse mount crashes

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
glusterfs-3.8dev-0.825.git2e40a95.el7rhgs.x86_64

How reproducible:
-----------------
Tried only once

Steps to Reproduce:
-------------------
1. Setup 3 RHEL 7.1 Nodes(hypervisors), having 1 brick per node, and create 1X3 volume
2. Optimize the volume for virt-store
3. Enable sharding on the volume ( setting shard-block-size to default of 4MB )
4. Create a Application VM which runs on node1
5. Stop all the incoming/outgoing traffic from node1 to/from node2,node3 using iptables rules

Actual results:
---------------
Fuse mount process crashed on all the 3 nodes

Expected results:
-----------------
Fuse mount should not crash

Comment 1 SATHEESARAN 2015-09-09 09:01:35 UTC

back trace from one of the node ( node1 )

(gdb) bt
#0  fuse_writev_cbk (frame=0x7f28943fdbf4, cookie=<optimized out>, this=0x7f289861cad0, op_ret=0, op_errno=30, stbuf=<optimized out>, postbuf=0x0, xdata=0x0) at fuse-bridge.c:2271
#1  0x00007f288e24bd64 in io_stats_writev_cbk (frame=0x7f289442b59c, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=30, prebuf=0x0, postbuf=0x0, xdata=0x0)
    at io-stats.c:1400
#2  0x00007f28968e6bd6 in default_writev_cbk (frame=0x7f289441a224, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=30, prebuf=0x0, postbuf=0x0, xdata=0x0)
    at defaults.c:1016
#3  0x00007f288e86ebdd in wb_writev_cbk (frame=0x7f289442c9c4, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=30, prebuf=0x0, postbuf=0x0, xdata=0x0)
    at write-behind.c:1255
#4  0x00007f288ea80843 in shard_writev_do_cbk (frame=frame@entry=0x7f28943ee204, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=-1, op_errno=op_errno@entry=30, 
    prebuf=prebuf@entry=0x7f288d887da4, postbuf=postbuf@entry=0x7f288d887e14, xdata=xdata@entry=0x0) at shard.c:2958
#5  0x00007f288ecd4923 in dht_writev_cbk (frame=0x7f28943f67b8, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=30, prebuf=0x7f288d887da4, postbuf=0x7f288d887e14, 
    xdata=0x0) at dht-inode-write.c:90
#6  0x00007f288ef18e13 in afr_writev_unwind (frame=0x7f28943f4c2c, this=<optimized out>) at afr-inode-write.c:197
#7  0x00007f288ef18e7d in afr_transaction_writev_unwind (frame=0x7f28943ef224, this=0x7f288800a620) at afr-inode-write.c:214
#8  0x00007f288ef21fcb in __afr_txn_write_done (frame=0x7f28943ef224, this=<optimized out>) at afr-transaction.c:81
#9  0x00007f288ef2529e in afr_unlock_common_cbk (frame=frame@entry=0x7f28943ef224, this=this@entry=0x7f288800a620, xdata=<optimized out>, op_errno=0, op_ret=<optimized out>, 
    cookie=<optimized out>) at afr-lk-common.c:633
#10 0x00007f288ef25337 in afr_unlock_inodelk_cbk (frame=0x7f28943ef224, cookie=<optimized out>, this=0x7f288800a620, op_ret=<optimized out>, op_errno=0, xdata=<optimized out>)
    at afr-lk-common.c:674
#11 0x00007f288f173b1d in client3_3_finodelk_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f28943f77d8) at client-rpc-fops.c:1673
#12 0x00007f28966aea10 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f28880a6840, pollin=pollin@entry=0x7f28800292f0) at rpc-clnt.c:759
#13 0x00007f28966aeccf in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f28880a6870, event=<optimized out>, data=0x7f28800292f0) at rpc-clnt.c:900
#14 0x00007f28966aa813 in rpc_transport_notify (this=this@entry=0x7f28880b6530, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f28800292f0) at rpc-transport.c:539
#15 0x00007f289186f646 in socket_event_poll_in (this=this@entry=0x7f28880b6530) at socket.c:2231
#16 0x00007f28918722a4 in socket_event_handler (fd=fd@entry=14, idx=idx@entry=3, data=0x7f28880b6530, poll_in=1, poll_out=0, poll_err=0) at socket.c:2344
#17 0x00007f28969418ca in event_dispatch_epoll_handler (event=0x7f288d798e80, event_pool=0x7f289860ed10) at event-epoll.c:570
#18 event_dispatch_epoll_worker (data=0x7f28880445e0) at event-epoll.c:673
#19 0x00007f2895748df5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f289508f1ad in clone () from /lib64/libc.so.6

Comment 3 SATHEESARAN 2015-09-09 09:21:46 UTC

Created attachment 1071648 [details]
coredump from Hypervisor1

Comment 4 SATHEESARAN 2015-09-09 09:22:13 UTC

Created attachment 1071649 [details]
coredump from Hypervisor2

Comment 5 SATHEESARAN 2015-09-09 09:22:41 UTC

Created attachment 1071650 [details]
coredump from Hypervisor3

Comment 6 SATHEESARAN 2015-09-09 09:24:59 UTC

Created attachment 1071653 [details]
sosreport from Hypervisor1

Comment 7 SATHEESARAN 2015-09-09 09:26:44 UTC

Created attachment 1071654 [details]
sosreport from Hypervisor2

Comment 8 SATHEESARAN 2015-09-09 09:28:07 UTC

Created attachment 1071656 [details]
sosreport from Hypervisor3

Comment 9 Krutika Dhananjay 2015-09-09 11:52:28 UTC

Nice catch, sas! Just checked the core. Turns out sharding is (wrongly) returning a non-negative return status even when there is a failure, thereby causing FUSE-bridge to assume the fop succeeded and dereference the iatt (which is NULL) and crash.

Comment 10 Vijay Bellur 2015-09-09 11:58:50 UTC

REVIEW: http://review.gluster.org/12140 (features/shard: Do not return non-negative status on failure in writev) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

Comment 11 Vijay Bellur 2015-09-10 05:07:37 UTC

COMMIT: http://review.gluster.org/12140 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit b1f851709c30505cac2b63bc49234ae818559d2d
Author: Krutika Dhananjay <kdhananj>
Date:   Wed Sep 9 17:25:14 2015 +0530

    features/shard: Do not return non-negative status on failure in writev
    
    Change-Id: I5f65c49484e44a05bb7df53c73869f89ad3392e0
    BUG: 1261399
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: http://review.gluster.org/12140
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 12 Niels de Vos 2016-06-16 13:35:52 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.