Bug 1471687

Summary: [Ganesha] : Ganesha crashed within seconds post failover/failback in gsh_free(),possible memory corruption.
Product: [Retired] nfs-ganesha Reporter: Ambarish <asoman>
Component: NFSAssignee: Frank Filz <ffilz>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.4CC: bturner, dang, jthottan, kkeithle, mbenjamin, pasik, rhinduja, skoduri
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-22 15:47:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ambarish 2017-07-17 09:15:58 UTC
4 node cluster,4 clients mounted the volume via v4 and were running kernel untar in separate directories.

I was simulating failovers/failbacks by killing and restarting nfs-ganesha service on random nodes.

When IO resumed post failover,I saw that Ganesha crashed on one of my nodes with the following BT :

(gdb) bt
#0  __GI___libc_free (mem=0x6000000000000) at malloc.c:2933
#1  0x00007f79e7c79d76 in gsh_free (p=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/include/abstract_mem.h:271
#2  glusterfs_close_my_fd (my_fd=0x7f7698090002)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1088
#3  0x00007f79e7c7b1ba in glusterfs_open2 (obj_hdl=0x7f76980d8ec0, state=0x7f7698090ee0, openflags=<optimized out>, 
    createmode=FSAL_EXCLUSIVE, name=<optimized out>, attrib_set=<optimized out>, 
    verifier=0x7f796df616c0 "TK)\001\070o", new_obj=0x7f796df61340, attrs_out=0x7f796df61350, 
    caller_perm_check=0x7f796df614bf) at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1443
#4  0x00005640a643a1ef in mdcache_open2 (obj_hdl=0x7f7710139728, state=0x7f7698090ee0, openflags=<optimized out>, 
    createmode=FSAL_EXCLUSIVE, name=0x0, attrs_in=0x7f796df615e0, verifier=0x7f796df616c0 "TK)\001\070o", 
    new_obj=0x7f796df61580, attrs_out=0x0, caller_perm_check=0x7f796df614bf)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:657
#5  0x00005640a636fcbb in fsal_open2 (in_obj=0x7f7710139728, state=0x7f7698090ee0, openflags=openflags@entry=2, 
    createmode=createmode@entry=FSAL_EXCLUSIVE, name=<optimized out>, attr=attr@entry=0x7f796df615e0, 
    verifier=verifier@entry=0x7f796df616c0 "TK)\001\070o", obj=obj@entry=0x7f796df61580, 
    attrs_out=attrs_out@entry=0x0) at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/fsal_helper.c:1846
#6  0x00005640a635b350 in open4_ex (arg=arg@entry=0x7f7924182008, data=data@entry=0x7f796df62180, 
    res_OPEN4=res_OPEN4@entry=0x7f76980f5e38, clientid=<optimized out>, owner=0x7f770007a440, 
    file_state=file_state@entry=0x7f796df61fa0, new_state=new_state@entry=0x7f796df61f8f)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_op_open.c:1441
#7  0x00005640a63a3469 in nfs4_op_open (op=0x7f7924182000, data=0x7f796df62180, resp=0x7f76980f5e30)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_op_open.c:1845
#8  0x00005640a639597d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f7698035350)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/Protocols/NFS/nfs4_Compound.c:734
#9  0x00005640a6386b1c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f79240008c0)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_worker_thread.c:1281
#10 0x00005640a638818a in worker_run (ctx=0x5640a6e38e70)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/MainNFSD/nfs_worker_thread.c:1548
#11 0x00005640a6411889 in fridgethr_start_routine (arg=0x5640a6e38e70)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/support/fridgethr.c:550
---Type <return> to continue, or q <return> to quit---
#12 0x00007f79eabdde25 in start_thread (arg=0x7f796df63700) at pthread_create.c:308
#13 0x00007f79ea2ab34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb)

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-ganesha-3.8.4-33.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.4-14.el7rhgs.x86_64


How reproducible:
-----------------

This was the first occurrence.