Bug 1466704

Summary: [Ganesha] : Ganesha crashed while running dbench
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: nfs-ganeshaAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED WONTFIX QA Contact: Ambarish <asoman>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: asoman, bturner, dang, ffilz, jthottan, kkeithle, mbenjamin, rcyriac, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-10 07:08:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ambarish 2017-06-30 09:46:13 UTC
Description of problem:
-----------------------


2 node Ganesha HA cluster, 4 clients mounted a gluster volume via v4 and ran dbench in loop.

Ganesha crashed on one of my nodes.
The BT looks different from the one I opned in https://bugzilla.redhat.com/show_bug.cgi?id=1466700,though the use case is the same.

<BT>

(gdb) bt
#0  0x00007ff8553211f7 in raise () from /lib64/libc.so.6
#1  0x00007ff8553228e8 in abort () from /lib64/libc.so.6
#2  0x00007ff855360f47 in __libc_message () from /lib64/libc.so.6
#3  0x00007ff855366b54 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007ff855369df7 in _int_malloc () from /lib64/libc.so.6
#5  0x00007ff85536c10c in malloc () from /lib64/libc.so.6
#6  0x0000563be257b10d in gsh_malloc__ (
    file=0x563be262ad60 "/builddir/build/BUILD/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_op_readdir.c", line=306, 
    function=<synthetic pointer>, n=12) at /usr/src/debug/nfs-ganesha-2.4.4/src/include/abstract_mem.h:78
#7  nfs4_readdir_callback (opaque=0x7ff7ca0e9b90, obj=0x7ff54806d928, attr=0x7ff7ca0e9d40, 
    mounted_on_fileid=12979408687758067389, cookie=<optimized out>, cb_state=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_op_readdir.c:306
#8  0x0000563be253f829 in populate_dirent (name=<optimized out>, obj=0x7ff54806d928, 
    attrs=attrs@entry=0x7ff7ca0e9d40, dir_state=dir_state@entry=0x7ff7ca0e9e90, cookie=914646794536317759)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_helper.c:1321
#9  0x0000563be2607fef in mdcache_readdir (dir_hdl=0x7ff5a807f8a8, whence=<optimized out>, 
    dir_state=0x7ff7ca0e9e90, cb=0x563be253f7d0 <populate_dirent>, attrmask=122830, eod_met=0x7ff7ca0e9f5b)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:707
#10 0x0000563be25415dd in fsal_readdir (directory=directory@entry=0x7ff5a807f8a8, cookie=cookie@entry=0, 
    nbfound=nbfound@entry=0x7ff7ca0e9f5c, eod_met=eod_met@entry=0x7ff7ca0e9f5b, attrmask=122830, 
    cb=cb@entry=0x563be257af40 <nfs4_readdir_callback>, opaque=opaque@entry=0x7ff7ca0e9f60)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_helper.c:1505
#11 0x0000563be257bf0b in nfs4_op_readdir (op=0x7ff65c019690, data=0x7ff7ca0ea180, resp=0x7ff4d80181e0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_op_readdir.c:631
#12 0x0000563be256897d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7ff4d80aae20)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_Compound.c:734
#13 0x0000563be2559b1c in nfs_rpc_execute (reqdata=reqdata@entry=0x7ff65c075880)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_worker_thread.c:1281
#14 0x0000563be255b18a in worker_run (ctx=0x563be3e9dfc0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_worker_thread.c:1548
#15 0x0000563be25e4889 in fridgethr_start_routine (arg=0x563be3e9dfc0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/support/fridgethr.c:550
#16 0x00007ff855d16e25 in start_thread () from /lib64/libpthread.so.0
#17 0x00007ff8553e434d in clone () from /lib64/libc.so.6
(gdb) 



</BT>


Version-Release number of selected component (if applicable):
------------------------------------------------------------

nfs-ganesha-2.4.4-10.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64

How reproducible:
------------------

Fairly reproducible


Additional info:
----------------

[root@gqas014 tmp]# gluster v info
 
Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 22c652d8-0754-438a-8131-373bad7c12ab
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (4 + 2) = 24
Transport-type: tcp
Bricks:
Brick1: gqas014.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick2: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick3: gqas014.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick4: gqas007.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick5: gqas014.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick6: gqas007.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick7: gqas014.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick8: gqas007.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick9: gqas014.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick10: gqas007.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick11: gqas014.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick12: gqas007.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick13: gqas014.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick14: gqas007.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick15: gqas014.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick16: gqas007.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick17: gqas014.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick18: gqas007.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick19: gqas014.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick20: gqas007.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick21: gqas014.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick22: gqas007.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick23: gqas014.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick24: gqas007.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Options Reconfigured:
ganesha.enable: on
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable

Comment 3 Daniel Gryniewicz 2017-06-30 15:18:02 UTC
This one is memory corruption.  Reproducing with ASAN or valgrind would be extremely helpful.