Bug 1467019

Summary: [Ganesha] : Ganesha crashed during mem_get,possible memory corruption.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: nfs-ganeshaAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED WONTFIX QA Contact: Ambarish <asoman>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: asoman, bturner, dang, ffilz, jthottan, kkeithle, mbenjamin, rcyriac, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-10 07:09:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ambarish 2017-07-01 19:02:50 UTC
Description of problem:
-----------------------


2 node cluster,4 clients writing in their specific subdirectories (using Bonnie,dbench,kernel untar).

Ganesha crashed on one of my nodes with the following BT :

<BT>

#0  mem_get_from_pool (pt_pool=0x7fcce0000e88) at mem-pool.c:644
#1  0x00007fcf3c0c426b in mem_get (mem_pool=mem_pool@entry=0x7fcf3c365940 <pools>) at mem-pool.c:684
#2  0x00007fcf3c0c4323 in mem_get0 (mem_pool=0x7fcf3c365940 <pools>) at mem-pool.c:581
#3  0x00007fcf3c09491c in get_new_data () at dict.c:40
#4  0x00007fcf3c0963d7 in bin_to_data (value=<optimized out>, len=<optimized out>) at dict.c:935
#5  0x00007fcf3c096476 in dict_set_bin_common (this=<optimized out>, key=<optimized out>, ptr=<optimized out>, size=<optimized out>, is_static=<optimized out>) at dict.c:2300
#6  0x00007fcf3c097efb in dict_set_static_bin (this=<optimized out>, key=<optimized out>, ptr=<optimized out>, size=<optimized out>) at dict.c:2329
#7  0x00007fcf3c38191e in glfs_resolve_component (fs=fs@entry=0x562567e93060, subvol=subvol@entry=0x7fcf2404b3e0, parent=parent@entry=0x7fcdd800ed10, component=component@entry=0x7fcce003ad00 "LWPSAV0.TMP", 
    iatt=iatt@entry=0x7fcedb79bfb0, force_lookup=<optimized out>) at glfs-resolve.c:379
#8  0x00007fcf3c381dfb in priv_glfs_resolve_at (fs=fs@entry=0x562567e93060, subvol=subvol@entry=0x7fcf2404b3e0, at=at@entry=0x7fcdd800ed10, origpath=origpath@entry=0x7fcce001e340 "LWPSAV0.TMP", 
    loc=loc@entry=0x7fcedb79c0b0, iatt=iatt@entry=0x7fcedb79c0f0, follow=follow@entry=0, reval=reval@entry=0) at glfs-resolve.c:501
#9  0x00007fcf3c3837b4 in pub_glfs_h_lookupat (fs=0x562567e93060, parent=<optimized out>, path=path@entry=0x7fcce001e340 "LWPSAV0.TMP", stat=stat@entry=0x7fcedb79c1d0, follow=follow@entry=0)
    at glfs-handleops.c:102
#10 0x00007fcf3c383898 in pub_glfs_h_lookupat34 (fs=<optimized out>, parent=<optimized out>, path=path@entry=0x7fcce001e340 "LWPSAV0.TMP", stat=stat@entry=0x7fcedb79c1d0) at glfs-handleops.c:133
#11 0x00007fcf3c7a039f in lookup (parent=0x7fcdd8014348, path=0x7fcce001e340 "LWPSAV0.TMP", handle=0x7fcedb79c310, attrs_out=0x7fcedb79c320)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:117
#12 0x0000562566fd2f8f in mdc_lookup_uncached (mdc_parent=mdc_parent@entry=0x7fcdd8005770, name=name@entry=0x7fcce001e340 "LWPSAV0.TMP", new_entry=new_entry@entry=0x7fcedb79c4c0, attrs_out=attrs_out@entry=0x0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1046
#13 0x0000562566fd5c5b in mdc_lookup (mdc_parent=0x7fcdd8005770, name=0x7fcce001e340 "LWPSAV0.TMP", uncached=uncached@entry=true, new_entry=new_entry@entry=0x7fcedb79c4c0, attrs_out=0x0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1004
#14 0x0000562566fcab7b in mdcache_lookup (parent=<optimized out>, name=<optimized out>, handle=0x7fcedb79c578, attrs_out=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:167
#15 0x0000562566f03ac7 in fsal_lookup (parent=parent@entry=0x7fcdd80057a8, name=0x7fcce001e340 "LWPSAV0.TMP", obj=obj@entry=0x7fcedb79c578, attrs_out=attrs_out@entry=0x0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_helper.c:712
#16 0x0000562566ef0ce6 in open4_ex (arg=arg@entry=0x7fcd3000b2f8, data=data@entry=0x7fcedb79d180, res_OPEN4=res_OPEN4@entry=0x7fcce0014198, clientid=<optimized out>, owner=0x7fcdd0001210, 
    file_state=file_state@entry=0x7fcedb79cfa0, new_state=new_state@entry=0x7fcedb79cf8f) at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_op_open.c:1257
#17 0x0000562566f39469 in nfs4_op_open (op=0x7fcd3000b2f0, data=0x7fcedb79d180, resp=0x7fcce0014190) at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_op_open.c:1845
#18 0x0000562566f2b97d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7fcce0020be0) at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_Compound.c:734
#19 0x0000562566f1cb1c in nfs_rpc_execute (reqdata=reqdata@entry=0x7fcd30012a60) at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_worker_thread.c:1281
#20 0x0000562566f1e18a in worker_run (ctx=0x56256af215e0) at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_worker_thread.c:1548
#21 0x0000562566fa7889 in fridgethr_start_routine (arg=0x56256af215e0) at /usr/src/debug/nfs-ganesha-2.4.4/src/support/fridgethr.c:550
#22 0x00007fcf3f704e25 in start_thread (arg=0x7fcedb79e700) at pthread_create.c:308
#23 0x00007fcf3edd234d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) 

</BT>


Version-Release number of selected component (if applicable):
-------------------------------------------------------------

nfs-ganesha-debuginfo-2.4.4-10.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64


How reproducible:
-----------------

Reporting the first occurrence of the crash.


Additional info:
---------------

[root@gqas007 tmp]# gluster v info
 
Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 22c652d8-0754-438a-8131-373bad7c12ab
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (4 + 2) = 24
Transport-type: tcp
Bricks:
Brick1: gqas014.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick2: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick3: gqas014.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick4: gqas007.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick5: gqas014.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick6: gqas007.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick7: gqas014.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick8: gqas007.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick9: gqas014.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick10: gqas007.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick11: gqas014.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick12: gqas007.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick13: gqas014.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick14: gqas007.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick15: gqas014.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick16: gqas007.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick17: gqas014.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick18: gqas007.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick19: gqas014.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick20: gqas007.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick21: gqas014.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick22: gqas007.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick23: gqas014.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick24: gqas007.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Options Reconfigured:
ganesha.enable: on
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable