1465536 – [GANESHA] Ganesha crashed on 2 nodes while running posix compliance suit on v3 mount

Bug 1465536 - [GANESHA] Ganesha crashed on 2 nodes while running posix compliance suit on v3 mount

Summary: [GANESHA] Ganesha crashed on 2 nodes while running posix compliance suit on v...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Soumya Koduri
QA Contact:	Manisha Saini
Docs Contact:
URL:
Whiteboard:	3.3.0-devel-freeze-exception
Depends On:
Blocks:	1417151
TreeView+	depends on / blocked

Reported:	2017-06-27 15:02 UTC by Manisha Saini
Modified:	2017-09-21 04:47 UTC (History)
CC List:	11 users (show)
Fixed In Version:	nfs-ganesha-2.4.4-14
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-09-21 04:47:57 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:2779	0	normal	SHIPPED_LIVE	nfs-ganesha bug fix and enhancement update	2017-09-21 08:17:17 UTC

Description Manisha Saini 2017-06-27 15:02:24 UTC

Description of problem:
Ganesha crashed on 2 nodes while running posix compliance suit on v3 mount

Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha
nfs-ganesha-debuginfo-2.4.4-10.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64
nfs-ganesha-2.4.4-10.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.4-10.el7rhgs.x86_64


How reproducible:
2/2

Steps to Reproduce:
1.Create 4 node ganesha cluster.
2.Create 3 volumes.Enable ganesha on it.
3.Mount one of the volume via v3 on client and run posix compliance suit

Actual results:
Ganesha got crashed on 2 out of 4 nodes

Expected results:
Ganesha should not crash

Additional info:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7efd1b97e700 (LWP 12348)]
0x00007efdfbe0ece8 in glusterfs_close_my_fd (my_fd=my_fd@entry=0x7efda45779e8)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1046
1046			SET_GLUSTER_CREDS(glfs_export, &op_ctx->creds->caller_uid,
(gdb) bt
#0  0x00007efdfbe0ece8 in glusterfs_close_my_fd (my_fd=my_fd@entry=0x7efda45779e8)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1046
#1  0x00007efdfbe10b0c in file_close (obj_hdl=0x7efda45779f8)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/FSAL_GLUSTER/handle.c:1097
#2  0x00005564b0319e81 in mdcache_close (obj_hdl=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:440
#3  0x00005564b0313f9a in fsal_close (obj_hdl=0x7efda4557908)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/include/fsal.h:416
#4  mdc_up_invalidate (export=<optimized out>, handle=<optimized out>, flags=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_up.c:72
#5  0x00005564b0253fe9 in queue_invalidate (ctx=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/FSAL_UP/fsal_up_async.c:81
#6  0x00005564b02f1889 in fridgethr_start_routine (arg=0x7efda8036140)
    at /usr/src/debug/nfs-ganesha-2.4.4/src/support/fridgethr.c:550
#7  0x00007efdfef78e25 in start_thread (arg=0x7efd1b97e700) at pthread_create.c:308
#8  0x00007efdfe64634d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) q


Will be attaching sosreports shortly

Comment 5 Daniel Gryniewicz 2017-06-27 16:39:27 UTC

So, the UP calls replace op_ctx.  From mdc_up_invalidate():

	req_ctx.fsal_export = &myself->export;
	save_ctx = op_ctx;
	op_ctx = &req_ctx;


The reason for this is that up-calls don't come from clients, so they don't have op_ctx set up by the protocol layer.  Worse, the invalidate_close() UP call is an async call, so it's run in a separate thread, so any op_ctx set up by the FSAL wouldn't even apply.  However, MDCACHE needs an op_ctx, and needs it to have a valid fsal_export, so it forces this.  None of the other fields in op_ctx will be valid.

Comment 6 Soumya Koduri 2017-06-27 17:28:16 UTC

Thanks for the clarification Dan. Posted the fix for this crash upstream for review - https://review.gerrithub.io/#/c/367266/

Comment 10 Manisha Saini 2017-07-14 12:21:53 UTC

Verified this fix on 

# rpm -qa | grep ganesha
nfs-ganesha-2.4.4-15.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.4-15.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-33.el7rhgs.x86_64


While running posix compliance without non-root user patch,I am unable to hit the crash.Hence moving this bug to verified state

Comment 12 errata-xmlrpc 2017-09-21 04:47:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2779

Note You need to log in before you can comment on or make changes to this bug.