1398921 – [Ganesha] : Ganesha crashes on reads and writes from heterogeneous clients (v3 and v4 mounts).

Bug 1398921 - [Ganesha] : Ganesha crashes on reads and writes from heterogeneous clients (v3 and v4 mounts).

Summary: [Ganesha] : Ganesha crashes on reads and writes from heterogeneous clients (v...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Kaleb KEITHLEY
QA Contact:	Ambarish
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-27 10:45 UTC by Ambarish
Modified:	2017-08-23 12:28 UTC (History)
CC List:	14 users (show)
Fixed In Version:	rhgs-3.2.0
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-23 12:28:42 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ambarish 2016-11-27 10:45:07 UTC

Description of problem:
------------------------

4 node Ganesha cluster.2*2 volume mounted on 4 clients via v3 and v4.

*Workload* - Iozone reads from 4 clients,dd from 2 clients and linux untar from 2 clients in 2 different sub-directories.

Almost half an hour into the workload,Ganesha crashed on one of the nodes and dumped core.

(gdb) bt
#0  0x00007fbfa6ef1e60 in MDCACHE ()
#1  0x00007fbfa1b46708 in _gf_ref_put (ref=ref@entry=0x7fbe700396e8) at refcount.c:47
#2  0x00007fbf8f0b2132 in dht_inode_ctx_get_mig_info (this=this@entry=0x7fbf8800ea20, inode=0x7fbf7f2f3bac, 
    src_subvol=src_subvol@entry=0x0, dst_subvol=dst_subvol@entry=0x7fbf7fffe090) at dht-helper.c:243
#3  0x00007fbf8f10be9e in dht_flush_cbk (frame=0x7fbf9c8a5970, cookie=<optimized out>, this=0x7fbf8800ea20, 
    op_ret=0, op_errno=117, xdata=0x0) at dht-inode-read.c:715
#4  0x00007fbf8f380225 in afr_flush_cbk (frame=0x7fbf9c8486d0, cookie=<optimized out>, this=<optimized out>, 
    op_ret=<optimized out>, op_errno=<optimized out>, xdata=<optimized out>) at afr-common.c:2961
#5  0x00007fbf8f5bfb26 in client3_3_flush_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, 
    myframe=0x7fbf9c883464) at client-rpc-fops.c:921
#6  0x00007fbfa18a2680 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fbf8809b5b0, pollin=pollin@entry=0x7fbf7a68ce30)
    at rpc-clnt.c:791
#7  0x00007fbfa18a295f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7fbf8809b5e0, event=<optimized out>, 
    data=0x7fbf7a68ce30) at rpc-clnt.c:962
#8  0x00007fbfa189e883 in rpc_transport_notify (this=this@entry=0x7fbf880ab2e0, 
    event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fbf7a68ce30) at rpc-transport.c:537
#9  0x00007fbf94421eb4 in socket_event_poll_in (this=this@entry=0x7fbf880ab2e0) at socket.c:2267
#10 0x00007fbf94424365 in socket_event_handler (fd=<optimized out>, idx=5, data=0x7fbf880ab2e0, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2397
#11 0x00007fbfa1b323d0 in event_dispatch_epoll_handler (event=0x7fbf7fffe540, event_pool=0x7fbfa8dbb030)
    at event-epoll.c:571
#12 event_dispatch_epoll_worker (data=0x7fbf8805db10) at event-epoll.c:674
#13 0x00007fbfa5139dc5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007fbfa480873d in clone () from /lib64/libc.so.6
(gdb) 


Version-Release number of selected component (if applicable):
-------------------------------------------------------------

nfs-ganesha-gluster-2.4.1-1.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64


How reproducible:
----------------

Reporting the first occurence

Steps to Reproduce:
-------------------

1. Mount a 2*2 volume via v3 and v4 on different clients.

2. Run iozone reads and couple of writes- dd,iozone,untar etc.

Actual results:
---------------

Ganesha crashes and dumps core.

Expected results:
-----------------

No crashes.

Additional info:
----------------

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: aeab0f8a-1e34-4681-bdf4-5b1416e46f27
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
server.allow-insecure: on
performance.stat-prefetch: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas011 /]#

Comment 5 Atin Mukherjee 2016-11-28 15:32:17 UTC

Putting needinfo on Susant & Du as well.

Comment 8 Susant Kumar Palai 2016-11-29 11:32:19 UTC

From the core:
(gdb) p *ref
$28 = {cnt = 0, release = 0x7fbfa6ef1e60 <MDCACHE>, data = 0x7fbfa6c7ca20 <mdcache_get_ref>}


 The ref count for miginfo object is zero. So it seems like double unref event. Will debug further from the code to figure out the RCA.

Note You need to log in before you can comment on or make changes to this bug.