Description of problem: Ganesha crashed on 2 nodes while running posix compliance suit on v3 mount Version-Release number of selected component (if applicable): # rpm -qa | grep ganesha nfs-ganesha-debuginfo-2.4.4-10.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64 nfs-ganesha-2.4.4-10.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.4-10.el7rhgs.x86_64 How reproducible: 2/2 Steps to Reproduce: 1.Create 4 node ganesha cluster. 2.Create 3 volumes.Enable ganesha on it. 3.Mount one of the volume via v3 on client and run posix compliance suit Actual results: Ganesha got crashed on 2 out of 4 nodes Expected results: Ganesha should not crash Additional info: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7efd1b97e700 (LWP 12348)] 0x00007efdfbe0ece8 in glusterfs_close_my_fd (my_fd=my_fd@entry=0x7efda45779e8) at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1046 1046 SET_GLUSTER_CREDS(glfs_export, &op_ctx->creds->caller_uid, (gdb) bt #0 0x00007efdfbe0ece8 in glusterfs_close_my_fd (my_fd=my_fd@entry=0x7efda45779e8) at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1046 #1 0x00007efdfbe10b0c in file_close (obj_hdl=0x7efda45779f8) at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1097 #2 0x00005564b0319e81 in mdcache_close (obj_hdl=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:440 #3 0x00005564b0313f9a in fsal_close (obj_hdl=0x7efda4557908) at /usr/src/debug/nfs-ganesha-2.4.4/src/include/fsal.h:416 #4 mdc_up_invalidate (export=<optimized out>, handle=<optimized out>, flags=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_up.c:72 #5 0x00005564b0253fe9 in queue_invalidate (ctx=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL_UP/fsal_up_async.c:81 #6 0x00005564b02f1889 in fridgethr_start_routine (arg=0x7efda8036140) at /usr/src/debug/nfs-ganesha-2.4.4/src/support/fridgethr.c:550 #7 0x00007efdfef78e25 in start_thread (arg=0x7efd1b97e700) at pthread_create.c:308 #8 0x00007efdfe64634d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 (gdb) q Will be attaching sosreports shortly
So, the UP calls replace op_ctx. From mdc_up_invalidate(): req_ctx.fsal_export = &myself->export; save_ctx = op_ctx; op_ctx = &req_ctx; The reason for this is that up-calls don't come from clients, so they don't have op_ctx set up by the protocol layer. Worse, the invalidate_close() UP call is an async call, so it's run in a separate thread, so any op_ctx set up by the FSAL wouldn't even apply. However, MDCACHE needs an op_ctx, and needs it to have a valid fsal_export, so it forces this. None of the other fields in op_ctx will be valid.
Thanks for the clarification Dan. Posted the fix for this crash upstream for review - https://review.gerrithub.io/#/c/367266/
Verified this fix on # rpm -qa | grep ganesha nfs-ganesha-2.4.4-15.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.4-15.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-33.el7rhgs.x86_64 While running posix compliance without non-root user patch,I am unable to hit the crash.Hence moving this bug to verified state
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2779