Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1361520

Summary:	ganesha crashes with segfault while mounting volume with v3 or v4.
Product:	[Retired] nfs-ganesha	Reporter:	Shashank Raj <sraj>
Component:	Cache Inode	Assignee:	Jiffin <jthottan>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	devel	CC:	bugs, dang, ffilz, jthottan, kkeithle, malahal, mbenjamin, ndevos, pasik, skoduri
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-11-22 15:22:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Shashank Raj 2016-07-29 09:30:35 UTC

Description of problem:

ganesha crashes with segfault while mounting volume with v3 or v4.

Version-Release number of selected component (if applicable):

[root@dhcp41-206 exports]# rpm -qa|grep glusterfs
glusterfs-ganesha-3.8.1-1.el7.x86_64
glusterfs-libs-3.8.1-1.el7.x86_64
glusterfs-3.8.1-1.el7.x86_64
glusterfs-client-xlators-3.8.1-1.el7.x86_64
glusterfs-api-3.8.1-1.el7.x86_64
glusterfs-fuse-3.8.1-1.el7.x86_64
glusterfs-server-3.8.1-1.el7.x86_64
glusterfs-geo-replication-3.8.1-1.el7.x86_64
glusterfs-cli-3.8.1-1.el7.x86_64

[root@dhcp41-206 exports]# rpm -qa|grep ganesha
glusterfs-ganesha-3.8.1-1.el7.x86_64
nfs-ganesha-gluster-2.4.0-0.14dev26.el7.centos.x86_64
nfs-ganesha-2.4.0-0.14dev26.el7.centos.x86_64


How reproducible:

Consistent

Steps to Reproduce:
1.Create a nfs-ganesha cluster with 4 nodes.
2.Create a volume and enable ganesha on the volume.
3.Change some parameter in export file and perform refresh config.
4.Try to mount the volume on client with v3 or v4

Observe that ganesha crashes with segfault error on the nodes from which we try to mount the volume:


>> ganesha service status on nodes:

● nfs-ganesha.service - NFS-Ganesha file server
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled)
   Active: failed (Result: signal) since Fri 2016-07-29 14:51:45 IST; 4min 28s ago
     Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
  Process: 5317 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS)
  Process: 29012 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=0/SUCCESS)
  Process: 29010 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS)
 Main PID: 29011 (code=killed, signal=SEGV)

Jul 29 14:32:37 dhcp43-133.lab.eng.blr.redhat.com systemd[1]: Starting NFS-Ganes...
Jul 29 14:32:37 dhcp43-133.lab.eng.blr.redhat.com systemd[1]: Started NFS-Ganesh...
Jul 29 14:51:45 dhcp43-133.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.servic...
Jul 29 14:51:45 dhcp43-133.lab.eng.blr.redhat.com systemd[1]: Unit nfs-ganesha.s...
Jul 29 14:51:45 dhcp43-133.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.servic...
Hint: Some lines were ellipsized, use -l to show in full.



● nfs-ganesha.service - NFS-Ganesha file server
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled)
   Active: failed (Result: signal) since Fri 2016-07-29 14:51:45 IST; 4min 28s ago
     Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
  Process: 7523 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS)
  Process: 30736 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=0/SUCCESS)
  Process: 30734 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS)
 Main PID: 30735 (code=killed, signal=SEGV)

Jul 29 14:31:16 dhcp41-206.lab.eng.blr.redhat.com systemd[1]: Starting NFS-Gane...
Jul 29 14:31:16 dhcp41-206.lab.eng.blr.redhat.com systemd[1]: Started NFS-Ganes...
Jul 29 14:51:45 dhcp41-206.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.servi...
Jul 29 14:51:45 dhcp41-206.lab.eng.blr.redhat.com systemd[1]: Unit nfs-ganesha....
Jul 29 14:51:45 dhcp41-206.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.servi...
Hint: Some lines were ellipsized, use -l to show in full.

>> dmesg shows below messages:

[173833.456765] ganesha.nfsd[9832]: segfault at 7fdf494aa06c ip 00007fdf6a917a40 sp 00007fdf6bffe450 error 4 in libglusterfs.so.0.0.1[7fdf6a8df000+eb000]

[172743.152933] ganesha.nfsd[5733]: segfault at 7f670800008c ip 00007f6797919a40 sp 00007f679d32a450 error 4 in libglusterfs.so.0.0.1[7f67978e1000+eb000]

>> following bt is seen with gdb:

with v3 mount:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f9acfd08700 (LWP 29049)]
0x00007f9b06dc1210 in pthread_spin_lock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f9b06dc1210 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007f9b03d6a25d in inode_ctx_get0 () from /lib64/libglusterfs.so.0
#2  0x00007f9b03d6a2e5 in inode_needs_lookup () from /lib64/libglusterfs.so.0
#3  0x00007f9b0403b546 in __glfs_resolve_inode () from /lib64/libgfapi.so.0
#4  0x00007f9b0403b64b in glfs_resolve_inode () from /lib64/libgfapi.so.0
#5  0x00007f9b0403bc69 in glfs_h_stat () from /lib64/libgfapi.so.0
#6  0x00007f9b04455324 in getattrs () from /usr/lib64/ganesha/libfsalgluster.so
#7  0x00007f9b08908f4a in mdcache_refresh_attrs ()
#8  0x00007f9b0890992a in mdcache_getattrs ()
#9  0x00007f9b0888a3cf in nfs_SetPostOpAttr ()
#10 0x00007f9b0888b933 in nfs3_fsinfo ()
#11 0x00007f9b08850c3c in nfs_rpc_execute ()
#12 0x00007f9b0885268e in worker_run ()
#13 0x00007f9b088e4649 in fridgethr_start_routine ()
#14 0x00007f9b06dbcdc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f9b0648b6ed in clone () from /lib64/libc.so.6
(gdb) 

with v4 mount:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f8dbf35c700 (LWP 30822)]
0x00007f8df8432210 in pthread_spin_lock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f8df8432210 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007f8df53db25d in inode_ctx_get0 () from /lib64/libglusterfs.so.0
#2  0x00007f8df53db2e5 in inode_needs_lookup () from /lib64/libglusterfs.so.0
#3  0x00007f8df56ac546 in __glfs_resolve_inode () from /lib64/libgfapi.so.0
#4  0x00007f8df56ac64b in glfs_resolve_inode () from /lib64/libgfapi.so.0
#5  0x00007f8df56acc69 in glfs_h_stat () from /lib64/libgfapi.so.0
#6  0x00007f8df5ac6324 in getattrs () from /usr/lib64/ganesha/libfsalgluster.so
#7  0x00007f8df9f79f4a in mdcache_refresh_attrs ()
#8  0x00007f8df9f7a92a in mdcache_getattrs ()
#9  0x00007f8df9efa787 in file_To_Fattr ()
#10 0x00007f8df9ed77e2 in nfs4_op_getattr ()
#11 0x00007f8df9ed267f in nfs4_Compound ()
#12 0x00007f8df9ec1c3c in nfs_rpc_execute ()
#13 0x00007f8df9ec368e in worker_run ()
#14 0x00007f8df9f55649 in fridgethr_start_routine ()
#15 0x00007f8df842ddc5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f8df7afc6ed in clone () from /lib64/libc.so.6
(gdb) 
 

Actual results:

ganesha crashes with segfault while mounting volume with v3 or v4.

Expected results:

ganesha should not crash while mounting the volume

Additional info:

Comment 1 Jiffin 2016-08-01 06:00:32 UTC

I can reproduce this issue with following steps :
1.) Create and start a volume
2.) start nfs-ganesha
3.) Export the volume via dbus command
4.) Modify the ganesha.conf file
5.) remove and add the volume again using dbus
6.) Mount the volume

The inode in the call path is invalid. Still to figure out why inode become invalid this case.

Comment 2 Jiffin 2016-08-02 11:37:53 UTC

RCA :

The mdcache entry created during the export of gluster volume is not cleaned up properly during unexport. So when the export is again added, it gives a entry already exist error(with respect to fsal_gluster, glfs_object corresponding to this entry is invalid) and further operation on this entry result in a crash(like mounting)

On further dig up it seems to be refcount of this variable never become zero during unexport, therefore the entry is not removed from mdcache layer.
  
For version 2.3 these entries were cleanup properly during
.
Similar problem is also present in fsal_vfs, but it didn't result any crash.

Comment 3 Daniel Gryniewicz 2016-08-02 12:25:30 UTC

Jiffin, could you try with this patch:

https://review.gerrithub.io/286045

Comment 4 Jiffin 2016-08-02 12:58:03 UTC

(In reply to Daniel Gryniewicz from comment #3)
> Jiffin, could you try with this patch:
> 
> https://review.gerrithub.io/286045

Thanks, It worked.

Comment 5 Ambarish 2016-08-04 08:15:39 UTC

*** Bug 1362621 has been marked as a duplicate of this bug. ***