Bug 1465536 - [GANESHA] Ganesha crashed on 2 nodes while running posix compliance suit on v3 mount
[GANESHA] Ganesha crashed on 2 nodes while running posix compliance suit on v...
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha (Show other bugs)
Unspecified Unspecified
urgent Severity unspecified
: ---
: RHGS 3.3.0
Assigned To: Soumya Koduri
Manisha Saini
Depends On:
Blocks: 1417151
  Show dependency treegraph
Reported: 2017-06-27 11:02 EDT by Manisha Saini
Modified: 2017-07-14 08:21 EDT (History)
11 users (show)

See Also:
Fixed In Version: nfs-ganesha-2.4.4-14
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Manisha Saini 2017-06-27 11:02:24 EDT
Description of problem:
Ganesha crashed on 2 nodes while running posix compliance suit on v3 mount

Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha

How reproducible:

Steps to Reproduce:
1.Create 4 node ganesha cluster.
2.Create 3 volumes.Enable ganesha on it.
3.Mount one of the volume via v3 on client and run posix compliance suit

Actual results:
Ganesha got crashed on 2 out of 4 nodes

Expected results:
Ganesha should not crash

Additional info:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7efd1b97e700 (LWP 12348)]
0x00007efdfbe0ece8 in glusterfs_close_my_fd (my_fd=my_fd@entry=0x7efda45779e8)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1046
1046			SET_GLUSTER_CREDS(glfs_export, &op_ctx->creds->caller_uid,
(gdb) bt
#0  0x00007efdfbe0ece8 in glusterfs_close_my_fd (my_fd=my_fd@entry=0x7efda45779e8)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1046
#1  0x00007efdfbe10b0c in file_close (obj_hdl=0x7efda45779f8)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1097
#2  0x00005564b0319e81 in mdcache_close (obj_hdl=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:440
#3  0x00005564b0313f9a in fsal_close (obj_hdl=0x7efda4557908)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/include/fsal.h:416
#4  mdc_up_invalidate (export=<optimized out>, handle=<optimized out>, flags=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_up.c:72
#5  0x00005564b0253fe9 in queue_invalidate (ctx=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL_UP/fsal_up_async.c:81
#6  0x00005564b02f1889 in fridgethr_start_routine (arg=0x7efda8036140)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/support/fridgethr.c:550
#7  0x00007efdfef78e25 in start_thread (arg=0x7efd1b97e700) at pthread_create.c:308
#8  0x00007efdfe64634d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) q

Will be attaching sosreports shortly
Comment 5 Daniel Gryniewicz 2017-06-27 12:39:27 EDT
So, the UP calls replace op_ctx.  From mdc_up_invalidate():

	req_ctx.fsal_export = &myself->export;
	save_ctx = op_ctx;
	op_ctx = &req_ctx;

The reason for this is that up-calls don't come from clients, so they don't have op_ctx set up by the protocol layer.  Worse, the invalidate_close() UP call is an async call, so it's run in a separate thread, so any op_ctx set up by the FSAL wouldn't even apply.  However, MDCACHE needs an op_ctx, and needs it to have a valid fsal_export, so it forces this.  None of the other fields in op_ctx will be valid.
Comment 6 Soumya Koduri 2017-06-27 13:28:16 EDT
Thanks for the clarification Dan. Posted the fix for this crash upstream for review - https://review.gerrithub.io/#/c/367266/
Comment 10 Manisha Saini 2017-07-14 08:21:53 EDT
Verified this fix on 

# rpm -qa | grep ganesha

While running posix compliance without non-root user patch,I am unable to hit the crash.Hence moving this bug to verified state

Note You need to log in before you can comment on or make changes to this bug.