Bug 1465536

Summary: [GANESHA] Ganesha crashed on 2 nodes while running posix compliance suit on v3 mount
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Manisha Saini <msaini>
Component: nfs-ganeshaAssignee: Soumya Koduri <skoduri>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: unspecified Docs Contact:
Priority: urgent    
Version: rhgs-3.3CC: amukherj, dang, ffilz, jthottan, mbenjamin, msaini, rcyriac, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: 3.3.0-devel-freeze-exception
Fixed In Version: nfs-ganesha-2.4.4-14 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-21 04:47:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1417151    

Description Manisha Saini 2017-06-27 15:02:24 UTC
Description of problem:
Ganesha crashed on 2 nodes while running posix compliance suit on v3 mount

Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha
nfs-ganesha-debuginfo-2.4.4-10.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64
nfs-ganesha-2.4.4-10.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.4-10.el7rhgs.x86_64


How reproducible:
2/2

Steps to Reproduce:
1.Create 4 node ganesha cluster.
2.Create 3 volumes.Enable ganesha on it.
3.Mount one of the volume via v3 on client and run posix compliance suit

Actual results:
Ganesha got crashed on 2 out of 4 nodes

Expected results:
Ganesha should not crash

Additional info:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7efd1b97e700 (LWP 12348)]
0x00007efdfbe0ece8 in glusterfs_close_my_fd (my_fd=my_fd@entry=0x7efda45779e8)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1046
1046			SET_GLUSTER_CREDS(glfs_export, &op_ctx->creds->caller_uid,
(gdb) bt
#0  0x00007efdfbe0ece8 in glusterfs_close_my_fd (my_fd=my_fd@entry=0x7efda45779e8)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1046
#1  0x00007efdfbe10b0c in file_close (obj_hdl=0x7efda45779f8)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1097
#2  0x00005564b0319e81 in mdcache_close (obj_hdl=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:440
#3  0x00005564b0313f9a in fsal_close (obj_hdl=0x7efda4557908)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/include/fsal.h:416
#4  mdc_up_invalidate (export=<optimized out>, handle=<optimized out>, flags=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_up.c:72
#5  0x00005564b0253fe9 in queue_invalidate (ctx=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL_UP/fsal_up_async.c:81
#6  0x00005564b02f1889 in fridgethr_start_routine (arg=0x7efda8036140)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/support/fridgethr.c:550
#7  0x00007efdfef78e25 in start_thread (arg=0x7efd1b97e700) at pthread_create.c:308
#8  0x00007efdfe64634d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) q


Will be attaching sosreports shortly

Comment 5 Daniel Gryniewicz 2017-06-27 16:39:27 UTC
So, the UP calls replace op_ctx.  From mdc_up_invalidate():

	req_ctx.fsal_export = &myself->export;
	save_ctx = op_ctx;
	op_ctx = &req_ctx;


The reason for this is that up-calls don't come from clients, so they don't have op_ctx set up by the protocol layer.  Worse, the invalidate_close() UP call is an async call, so it's run in a separate thread, so any op_ctx set up by the FSAL wouldn't even apply.  However, MDCACHE needs an op_ctx, and needs it to have a valid fsal_export, so it forces this.  None of the other fields in op_ctx will be valid.

Comment 6 Soumya Koduri 2017-06-27 17:28:16 UTC
Thanks for the clarification Dan. Posted the fix for this crash upstream for review - https://review.gerrithub.io/#/c/367266/

Comment 10 Manisha Saini 2017-07-14 12:21:53 UTC
Verified this fix on 

# rpm -qa | grep ganesha
nfs-ganesha-2.4.4-15.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.4-15.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-33.el7rhgs.x86_64


While running posix compliance without non-root user patch,I am unable to hit the crash.Hence moving this bug to verified state

Comment 12 errata-xmlrpc 2017-09-21 04:47:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2779