Bug 1465536 - [GANESHA] Ganesha crashed on 2 nodes while running posix compliance suit on v3 mount
[GANESHA] Ganesha crashed on 2 nodes while running posix compliance suit on v...
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha (Show other bugs)
Unspecified Unspecified
urgent Severity unspecified
: ---
: RHGS 3.3.0
Assigned To: Soumya Koduri
Manisha Saini
Depends On:
Blocks: 1417151
  Show dependency treegraph
Reported: 2017-06-27 11:02 EDT by Manisha Saini
Modified: 2017-09-21 00:47 EDT (History)
11 users (show)

See Also:
Fixed In Version: nfs-ganesha-2.4.4-14
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2017-09-21 00:47:57 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Manisha Saini 2017-06-27 11:02:24 EDT
Description of problem:
Ganesha crashed on 2 nodes while running posix compliance suit on v3 mount

Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha

How reproducible:

Steps to Reproduce:
1.Create 4 node ganesha cluster.
2.Create 3 volumes.Enable ganesha on it.
3.Mount one of the volume via v3 on client and run posix compliance suit

Actual results:
Ganesha got crashed on 2 out of 4 nodes

Expected results:
Ganesha should not crash

Additional info:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7efd1b97e700 (LWP 12348)]
0x00007efdfbe0ece8 in glusterfs_close_my_fd (my_fd=my_fd@entry=0x7efda45779e8)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1046
1046			SET_GLUSTER_CREDS(glfs_export, &op_ctx->creds->caller_uid,
(gdb) bt
#0  0x00007efdfbe0ece8 in glusterfs_close_my_fd (my_fd=my_fd@entry=0x7efda45779e8)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1046
#1  0x00007efdfbe10b0c in file_close (obj_hdl=0x7efda45779f8)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1097
#2  0x00005564b0319e81 in mdcache_close (obj_hdl=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:440
#3  0x00005564b0313f9a in fsal_close (obj_hdl=0x7efda4557908)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/include/fsal.h:416
#4  mdc_up_invalidate (export=<optimized out>, handle=<optimized out>, flags=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_up.c:72
#5  0x00005564b0253fe9 in queue_invalidate (ctx=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL_UP/fsal_up_async.c:81
#6  0x00005564b02f1889 in fridgethr_start_routine (arg=0x7efda8036140)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/support/fridgethr.c:550
#7  0x00007efdfef78e25 in start_thread (arg=0x7efd1b97e700) at pthread_create.c:308
#8  0x00007efdfe64634d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) q

Will be attaching sosreports shortly
Comment 5 Daniel Gryniewicz 2017-06-27 12:39:27 EDT
So, the UP calls replace op_ctx.  From mdc_up_invalidate():

	req_ctx.fsal_export = &myself->export;
	save_ctx = op_ctx;
	op_ctx = &req_ctx;

The reason for this is that up-calls don't come from clients, so they don't have op_ctx set up by the protocol layer.  Worse, the invalidate_close() UP call is an async call, so it's run in a separate thread, so any op_ctx set up by the FSAL wouldn't even apply.  However, MDCACHE needs an op_ctx, and needs it to have a valid fsal_export, so it forces this.  None of the other fields in op_ctx will be valid.
Comment 6 Soumya Koduri 2017-06-27 13:28:16 EDT
Thanks for the clarification Dan. Posted the fix for this crash upstream for review - https://review.gerrithub.io/#/c/367266/
Comment 10 Manisha Saini 2017-07-14 08:21:53 EDT
Verified this fix on 

# rpm -qa | grep ganesha

While running posix compliance without non-root user patch,I am unable to hit the crash.Hence moving this bug to verified state
Comment 12 errata-xmlrpc 2017-09-21 00:47:57 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.