Bug 1413350
Summary: | [Ganesha] : Subsequent mounts fail and Ganesha crashes (during an attempt to mount) post volume restarts. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> |
Component: | nfs-ganesha | Assignee: | Daniel Gryniewicz <dang> |
Status: | CLOSED ERRATA | QA Contact: | Ambarish <asoman> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.2 | CC: | aloganat, amukherj, asoman, bturner, dang, ffilz, jthottan, mbenjamin, rcyriac, rhinduja, rhs-bugs, skoduri, storage-qa-internal |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | RHGS 3.2.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | nfs-ganesha-2.4.1-6 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-23 06:28:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1351528 |
Description
Ambarish
2017-01-15 09:09:27 UTC
While working on a reproducer Ganesha crashed on all the 4 nodes. Setup shared with Dev for further RCA. This issue is not seen with previous build, nfs-ganesha-gluster-2.4.1-4.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-11.el7rhgs.x86_64 nfs-ganesha-2.4.1-4.el7rhgs.x86_64 Marking it as Regression from 3.1.3 -> 3.2 More clearly this regression was introduced between 2.4.1-4 -> 2.4.1-5. Tried in another setup with nfs-ganesha-gluster-2.4.1-4.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-11.el7rhgs.x86_64 nfs-ganesha-2.4.1-4.el7rhgs.x86_64 and the issue is reproducible. Thanks Arthy. Not sure which build may have caused this regression but definitely not 2.4.1-5. There seem to be ref_leak for md-cache entry due to which it is not being cleaned up during volume unexport. Hence when the volume is re-exported with the same exportID, same md-cache entry is being re-used which is referring to the old freed memory (in this case glusterfs inode structure). So quite possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1413502 While debugging this issue using gdb on QE setup, found that in mdcache_unexport(), 160 /* Unhash the root object */ 161 assert(!cih_remove_checked(root_entry)); 162 } Line-161 never gets processed because of which root_entry doesn't get unref'ed resulting in ref leak. Dan confirmed that that could be the reason for this unexport issue. Proposed fix by Dan - https://review.gerrithub.io/#/c/343263/ Hitting the same crash, while doing refresh config on the nfs-ganesha enabled volume. (gdb) bt #0 pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24 #1 0x00007f9e5ff88ebd in inode_ctx_get0 () from /lib64/libglusterfs.so.0 #2 0x00007f9e5ff88f45 in inode_needs_lookup () from /lib64/libglusterfs.so.0 #3 0x00007f9e6025bc86 in __glfs_resolve_inode () from /lib64/libgfapi.so.0 #4 0x00007f9e6025bd8b in glfs_resolve_inode () from /lib64/libgfapi.so.0 #5 0x00007f9e6025c3f9 in glfs_h_stat () from /lib64/libgfapi.so.0 #6 0x00007f9e60677df4 in getattrs (obj_hdl=0x7f9d78638558, attrs=0x7f9debfd5d40) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:756 #7 0x00007f9e64f0ca14 in mdcache_refresh_attrs (entry=entry@entry=0x7f9d78e508f0, need_acl=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:939 #8 0x00007f9e64f0d51a in mdcache_getattrs (obj_hdl=0x7f9d78e50928, attrs_out=0x7f9debfd5fd0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1032 #9 0x00007f9e64e91e17 in file_To_Fattr (data=data@entry=0x7f9debfd6180, request_mask=1433550, attr=attr@entry=0x7f9debfd5fd0, Fattr=Fattr@entry=0x7f9d78654760, Bitmap=Bitmap@entry=0x7f9d74182e18) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs_proto_tools.c:3299 #10 0x00007f9e64e6f0c2 in nfs4_op_getattr (op=0x7f9d74182e10, data=0x7f9debfd6180, resp=0x7f9d78654750) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_getattr.c:140 #11 0x00007f9e64e69f8d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f9d78e58b20) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734 #12 0x00007f9e64e5b13c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f9d740008c0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281 #13 0x00007f9e64e5c79a in worker_run (ctx=0x7f9e69947f40) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548 #14 0x00007f9e64ee6409 in fridgethr_start_routine (arg=0x7f9e69947f40) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550 #15 0x00007f9e633c6dc5 in start_thread (arg=0x7f9debfd7700) at pthread_create.c:308 #16 0x00007f9e62a9573d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 The reported issue was not reproducible on Ganesha 2.4.1-6,Gluster 3.8.4-12. Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0493.html |