Bug 1372891

Summary: Ganesha crashes on multithreaded writes post volume restart
Product: [Retired] nfs-ganesha Reporter: Ambarish <asoman>
Component: Cache InodeAssignee: Soumya Koduri <skoduri>
Status: CLOSED CURRENTRELEASE QA Contact: Ambarish <asoman>
Severity: high Docs Contact:
Priority: unspecified    
Version: 1.5CC: asoman, bugs, ffilz, jthottan, kkeithle, rhinduja, skoduri, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 2.4-rc5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1375242 (view as bug list) Environment:
Last Closed: 2016-09-19 09:47:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1375242    

Description Ambarish 2016-09-03 10:34:20 UTC
Description of problem:
-----------------------

Had a 4-node Ganesha cluster with 256 Worker Threads.
Restarted the volume in an attempt to fetch server profiles for each operation.
Ganesha crashed on 2/4 servers during multi-threaded writes using iozone.

Mount version : 4

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
nfs-ganesha-2.4-0.dev.26.el7.x86_64

How reproducible:
----------------

3/3

Steps to Reproduce:
-------------------

1. Configure Ganesha to run with 256 worker threads.Restart Ganesha service.

2. Restart gluster volume

3. Run  Sequential Writes using iozone :

iozone -+m <config file> -+h <hostname> -C -w -c -e -i 0 -+n -r 64k -s 8g -t 16


Actual results:
---------------

Ganesha crashes immediately as soon as writes begin.

Expected results:
----------------

No crashes.

Additional info:
----------------

Vol Type : 2*2
Client and Server OS : RHEL 7.2

Comment 1 Ambarish 2016-09-03 10:37:21 UTC
****************************
BT from the crashed process :
****************************

(gdb) BT
#0  0x00007fd61b80f210 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007fd6189a425d in inode_ctx_get0 () from /lib64/libglusterfs.so.0
#2  0x00007fd6189a42e5 in inode_needs_lookup () from /lib64/libglusterfs.so.0
#3  0x00007fd618c75556 in __glfs_resolve_inode () from /lib64/libgfapi.so.0
#4  0x00007fd618c7565b in glfs_resolve_inode () from /lib64/libgfapi.so.0
#5  0x00007fd618c75c79 in glfs_h_stat () from /lib64/libgfapi.so.0
#6  0x00007fd6190917f6 in getattrs (obj_hdl=0x132cf58, attrs=0x7fd5e77f4980)
    at /usr/src/debug/nfs-ganesha-2.4-dev-26-0.1.1-Source/FSAL/FSAL_GLUSTER/handle.c:769
#7  0x0000000000529ab9 in mdcache_refresh_attrs (entry=0x132d2e0, need_acl=true)
    at /usr/src/debug/nfs-ganesha-2.4-dev-26-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:829
#8  0x0000000000529f9d in mdcache_getattrs (obj_hdl=0x132d318, attrs_out=0x7fd5e77f4b40)
    at /usr/src/debug/nfs-ganesha-2.4-dev-26-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:919
#9  0x000000000042a10a in fsal_test_access (obj_hdl=0x132d318, access_type=3338666087, allowed=0x7fd5e77f4c9c, 
    denied=0x7fd5e77f4c98, owner_skip=false)
    at /usr/src/debug/nfs-ganesha-2.4-dev-26-0.1.1-Source/FSAL/access_check.c:827
#10 0x00000000004f0b82 in nfs_access_op (obj=0x132d318, requested_access=31, granted_access=0x7fd51c000bb0, 
    supported_access=0x7fd51c000bac) at /usr/src/debug/nfs-ganesha-2.4-dev-26-0.1.1-Source/support/nfs_creds.c:725
#11 0x000000000045d4ae in nfs4_op_access (op=0x7fd554001870, data=0x7fd5e77f4d90, resp=0x7fd51c000ba0)
    at /usr/src/debug/nfs-ganesha-2.4-dev-26-0.1.1-Source/Protocols/NFS/nfs4_op_access.c:91
#12 0x000000000045c8ba in nfs4_Compound (arg=0x7fd554000aa8, req=0x7fd5540008e8, res=0x7fd51c0009f0)
    at /usr/src/debug/nfs-ganesha-2.4-dev-26-0.1.1-Source/Protocols/NFS/nfs4_Compound.c:734
#13 0x000000000044a79c in nfs_rpc_execute (reqdata=0x7fd5540008c0)
    at /usr/src/debug/nfs-ganesha-2.4-dev-26-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1306
#14 0x000000000044b0de in worker_run (ctx=0x135b310)
    at /usr/src/debug/nfs-ganesha-2.4-dev-26-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1570
#15 0x00000000004fdc37 in fridgethr_start_routine (arg=0x135b310)
    at /usr/src/debug/nfs-ganesha-2.4-dev-26-0.1.1-Source/support/fridgethr.c:550
#16 0x00007fd61b80adc5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007fd61aeca1cd in clone () from /lib64/libc.so.6
(gdb)

Comment 5 Niels de Vos 2016-09-12 05:40:04 UTC
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 6 Jiffin 2016-09-16 13:02:29 UTC
*** Bug 1375242 has been marked as a duplicate of this bug. ***