Description of problem:
-----------------------
2 node Ganesha HA cluster, 4 clients mounted a gluster volume via v4 and ran dbench in loop.
Ganesha crashed on one of my nodes.
The BT looks different from the one I opned in https://bugzilla.redhat.com/show_bug.cgi?id=1466700,though the use case is the same.
<BT>
(gdb) bt
#0 0x00007ff8553211f7 in raise () from /lib64/libc.so.6
#1 0x00007ff8553228e8 in abort () from /lib64/libc.so.6
#2 0x00007ff855360f47 in __libc_message () from /lib64/libc.so.6
#3 0x00007ff855366b54 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007ff855369df7 in _int_malloc () from /lib64/libc.so.6
#5 0x00007ff85536c10c in malloc () from /lib64/libc.so.6
#6 0x0000563be257b10d in gsh_malloc__ (
file=0x563be262ad60 "/builddir/build/BUILD/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_op_readdir.c", line=306,
function=<synthetic pointer>, n=12) at /usr/src/debug/nfs-ganesha-2.4.4/src/include/abstract_mem.h:78
#7 nfs4_readdir_callback (opaque=0x7ff7ca0e9b90, obj=0x7ff54806d928, attr=0x7ff7ca0e9d40,
mounted_on_fileid=12979408687758067389, cookie=<optimized out>, cb_state=<optimized out>)
at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_op_readdir.c:306
#8 0x0000563be253f829 in populate_dirent (name=<optimized out>, obj=0x7ff54806d928,
attrs=attrs@entry=0x7ff7ca0e9d40, dir_state=dir_state@entry=0x7ff7ca0e9e90, cookie=914646794536317759)
at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_helper.c:1321
#9 0x0000563be2607fef in mdcache_readdir (dir_hdl=0x7ff5a807f8a8, whence=<optimized out>,
dir_state=0x7ff7ca0e9e90, cb=0x563be253f7d0 <populate_dirent>, attrmask=122830, eod_met=0x7ff7ca0e9f5b)
at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:707
#10 0x0000563be25415dd in fsal_readdir (directory=directory@entry=0x7ff5a807f8a8, cookie=cookie@entry=0,
nbfound=nbfound@entry=0x7ff7ca0e9f5c, eod_met=eod_met@entry=0x7ff7ca0e9f5b, attrmask=122830,
cb=cb@entry=0x563be257af40 <nfs4_readdir_callback>, opaque=opaque@entry=0x7ff7ca0e9f60)
at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_helper.c:1505
#11 0x0000563be257bf0b in nfs4_op_readdir (op=0x7ff65c019690, data=0x7ff7ca0ea180, resp=0x7ff4d80181e0)
at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_op_readdir.c:631
#12 0x0000563be256897d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7ff4d80aae20)
at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_Compound.c:734
#13 0x0000563be2559b1c in nfs_rpc_execute (reqdata=reqdata@entry=0x7ff65c075880)
at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_worker_thread.c:1281
#14 0x0000563be255b18a in worker_run (ctx=0x563be3e9dfc0)
at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_worker_thread.c:1548
#15 0x0000563be25e4889 in fridgethr_start_routine (arg=0x563be3e9dfc0)
at /usr/src/debug/nfs-ganesha-2.4.4/src/support/fridgethr.c:550
#16 0x00007ff855d16e25 in start_thread () from /lib64/libpthread.so.0
#17 0x00007ff8553e434d in clone () from /lib64/libc.so.6
(gdb)
</BT>
Version-Release number of selected component (if applicable):
------------------------------------------------------------
nfs-ganesha-2.4.4-10.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64
How reproducible:
------------------
Fairly reproducible
Additional info:
----------------
[root@gqas014 tmp]# gluster v info
Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 22c652d8-0754-438a-8131-373bad7c12ab
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (4 + 2) = 24
Transport-type: tcp
Bricks:
Brick1: gqas014.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick2: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick3: gqas014.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick4: gqas007.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick5: gqas014.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick6: gqas007.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick7: gqas014.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick8: gqas007.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick9: gqas014.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick10: gqas007.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick11: gqas014.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick12: gqas007.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick13: gqas014.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick14: gqas007.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick15: gqas014.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick16: gqas007.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick17: gqas014.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick18: gqas007.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick19: gqas014.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick20: gqas007.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick21: gqas014.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick22: gqas007.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick23: gqas014.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick24: gqas007.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Options Reconfigured:
ganesha.enable: on
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
Comment 3Daniel Gryniewicz
2017-06-30 15:18:02 UTC
This one is memory corruption. Reproducing with ASAN or valgrind would be extremely helpful.