Bug 1369074

Summary: Running refresh config on volume makes ganesha crashes with segfault error on one node.
Product: [Retired] nfs-ganesha Reporter: Shashank Raj <sraj>
Component: Cache InodeAssignee: Jiffin <jthottan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: develCC: bugs, ffilz, jthottan, kkeithle, mzywusko, ndevos, skoduri, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-ganesha-2.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-11 09:04:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shashank Raj 2016-08-22 12:42:28 UTC
Description of problem:

Running refresh config on volume makes ganesha crashes on one node.

Version-Release number of selected component (if applicable):

[root@dhcp43-116 ~]# rpm -qa|grep glusterfs
glusterfs-fuse-3.8.2-0.1.gitd33aa0b.el7rhgs.x86_64
glusterfs-libs-3.8.2-0.1.gitd33aa0b.el7rhgs.x86_64
glusterfs-3.8.2-0.1.gitd33aa0b.el7rhgs.x86_64
glusterfs-client-xlators-3.8.2-0.1.gitd33aa0b.el7rhgs.x86_64
glusterfs-api-3.8.2-0.1.gitd33aa0b.el7rhgs.x86_64
glusterfs-cli-3.8.2-0.1.gitd33aa0b.el7rhgs.x86_64
glusterfs-server-3.8.2-0.1.gitd33aa0b.el7rhgs.x86_64
glusterfs-geo-replication-3.8.2-0.1.gitd33aa0b.el7rhgs.x86_64
glusterfs-ganesha-3.8.2-0.1.gitd33aa0b.el7rhgs.x86_64

[root@dhcp43-116 ~]# rpm -qa|grep ganesha
nfs-ganesha-gluster-next.20160813.2f47e8a-1.el7.centos.x86_64
nfs-ganesha-next.20160813.2f47e8a-1.el7.centos.x86_64
nfs-ganesha-debuginfo-next.20160813.2f47e8a-1.el7.centos.x86_64
glusterfs-ganesha-3.8.2-0.1.gitd33aa0b.el7rhgs.x86_64


How reproducible:

Consistent

Steps to Reproduce:

1.Create a volume, start it and enable ganesha on the volume.
2.Change some parameter value in volume export file on one node.
3.Run refresh config on the volume and observe that it fails

[root@dhcp43-116 ~]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha myvolume
Error: refresh-config failed on dhcp42-237.

and ganesha crashes with below message:

[432713.567038] ganesha.nfsd[4275]: segfault at 70 ip 00007f161320db90 sp 00007f15d6a92140 error 4 in ganesha.nfsd[7f1613102000+171000]

bt using gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f2eda8cc700 (LWP 2302)]
0x00007f2f17076b90 in mdc_cur_export ()
    at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h:376
376		return mdc_export(op_ctx->fsal_export);
(gdb) bt
#0  0x00007f2f17076b90 in mdc_cur_export ()
    at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h:376
#1  mdcache_lru_clean (entry=0x7f2f184e7360)
    at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:421
#2  mdcache_lru_unref (entry=0x7f2f184e7360, flags=<optimized out>)
    at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1498
#3  0x00007f2f17056b36 in release_export_root (
    export=export@entry=0x7f2f18413f58)
    at /usr/src/debug/nfs-ganesha/src/support/exports.c:2119
#4  0x00007f2f17056f20 in unexport (export=export@entry=0x7f2f18413f58)
    at /usr/src/debug/nfs-ganesha/src/support/exports.c:2141
#5  0x00007f2f17067058 in gsh_export_removeexport (args=<optimized out>, 
    reply=<optimized out>, error=0x7f2eda8cb2e0)
    at /usr/src/debug/nfs-ganesha/src/support/export_mgr.c:1092
#6  0x00007f2f17086f39 in dbus_message_entrypoint (conn=0x7f2f184e38b0, 
    msg=0x7f2f184e3b90, user_data=<optimized out>)
    at /usr/src/debug/nfs-ganesha/src/dbus/dbus_server.c:512
---Type <return> to continue, or q <return> to quit---
#7  0x00007f2f16914c86 in _dbus_object_tree_dispatch_and_unlock ()
   from /lib64/libdbus-1.so.3
#8  0x00007f2f16906e49 in dbus_connection_dispatch () from /lib64/libdbus-1.so.3
#9  0x00007f2f169070e2 in _dbus_connection_read_write_dispatch ()
   from /lib64/libdbus-1.so.3
#10 0x00007f2f17087faf in gsh_dbus_thread (arg=<optimized out>)
    at /usr/src/debug/nfs-ganesha/src/dbus/dbus_server.c:737
#11 0x00007f2f15530dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f2f14bfe1cd in clone () from /lib64/libc.so.6


Actual results:

Running refresh config on volume makes ganesha crashes with segfault error on one node.

Expected results:

there should not be any crash and refresh config should succeed

Additional info:

sosreport will be attached

Comment 1 Shashank Raj 2016-08-22 12:52:10 UTC
sosreports and logs can be accessed from http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1369074

Comment 2 Shashank Raj 2016-08-23 09:30:37 UTC
We are hitting this issue consistently not only with refresh config but also with export and unexport of ganesha volume.

Comment 3 Jiffin 2016-09-06 07:01:36 UTC
Patch got merged in upstream https://review.gerrithub.io/#/c/288708/

Comment 4 Niels de Vos 2016-09-12 05:37:41 UTC
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html