Description of problem: ganesha crashes with segfault while mounting volume with v3 or v4. Version-Release number of selected component (if applicable): [root@dhcp41-206 exports]# rpm -qa|grep glusterfs glusterfs-ganesha-3.8.1-1.el7.x86_64 glusterfs-libs-3.8.1-1.el7.x86_64 glusterfs-3.8.1-1.el7.x86_64 glusterfs-client-xlators-3.8.1-1.el7.x86_64 glusterfs-api-3.8.1-1.el7.x86_64 glusterfs-fuse-3.8.1-1.el7.x86_64 glusterfs-server-3.8.1-1.el7.x86_64 glusterfs-geo-replication-3.8.1-1.el7.x86_64 glusterfs-cli-3.8.1-1.el7.x86_64 [root@dhcp41-206 exports]# rpm -qa|grep ganesha glusterfs-ganesha-3.8.1-1.el7.x86_64 nfs-ganesha-gluster-2.4.0-0.14dev26.el7.centos.x86_64 nfs-ganesha-2.4.0-0.14dev26.el7.centos.x86_64 How reproducible: Consistent Steps to Reproduce: 1.Create a nfs-ganesha cluster with 4 nodes. 2.Create a volume and enable ganesha on the volume. 3.Change some parameter in export file and perform refresh config. 4.Try to mount the volume on client with v3 or v4 Observe that ganesha crashes with segfault error on the nodes from which we try to mount the volume: >> ganesha service status on nodes: ● nfs-ganesha.service - NFS-Ganesha file server Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled) Active: failed (Result: signal) since Fri 2016-07-29 14:51:45 IST; 4min 28s ago Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki Process: 5317 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS) Process: 29012 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=0/SUCCESS) Process: 29010 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS) Main PID: 29011 (code=killed, signal=SEGV) Jul 29 14:32:37 dhcp43-133.lab.eng.blr.redhat.com systemd[1]: Starting NFS-Ganes... Jul 29 14:32:37 dhcp43-133.lab.eng.blr.redhat.com systemd[1]: Started NFS-Ganesh... Jul 29 14:51:45 dhcp43-133.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.servic... Jul 29 14:51:45 dhcp43-133.lab.eng.blr.redhat.com systemd[1]: Unit nfs-ganesha.s... Jul 29 14:51:45 dhcp43-133.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.servic... Hint: Some lines were ellipsized, use -l to show in full. ● nfs-ganesha.service - NFS-Ganesha file server Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled) Active: failed (Result: signal) since Fri 2016-07-29 14:51:45 IST; 4min 28s ago Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki Process: 7523 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS) Process: 30736 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=0/SUCCESS) Process: 30734 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS) Main PID: 30735 (code=killed, signal=SEGV) Jul 29 14:31:16 dhcp41-206.lab.eng.blr.redhat.com systemd[1]: Starting NFS-Gane... Jul 29 14:31:16 dhcp41-206.lab.eng.blr.redhat.com systemd[1]: Started NFS-Ganes... Jul 29 14:51:45 dhcp41-206.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.servi... Jul 29 14:51:45 dhcp41-206.lab.eng.blr.redhat.com systemd[1]: Unit nfs-ganesha.... Jul 29 14:51:45 dhcp41-206.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.servi... Hint: Some lines were ellipsized, use -l to show in full. >> dmesg shows below messages: [173833.456765] ganesha.nfsd[9832]: segfault at 7fdf494aa06c ip 00007fdf6a917a40 sp 00007fdf6bffe450 error 4 in libglusterfs.so.0.0.1[7fdf6a8df000+eb000] [172743.152933] ganesha.nfsd[5733]: segfault at 7f670800008c ip 00007f6797919a40 sp 00007f679d32a450 error 4 in libglusterfs.so.0.0.1[7f67978e1000+eb000] >> following bt is seen with gdb: with v3 mount: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f9acfd08700 (LWP 29049)] 0x00007f9b06dc1210 in pthread_spin_lock () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f9b06dc1210 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00007f9b03d6a25d in inode_ctx_get0 () from /lib64/libglusterfs.so.0 #2 0x00007f9b03d6a2e5 in inode_needs_lookup () from /lib64/libglusterfs.so.0 #3 0x00007f9b0403b546 in __glfs_resolve_inode () from /lib64/libgfapi.so.0 #4 0x00007f9b0403b64b in glfs_resolve_inode () from /lib64/libgfapi.so.0 #5 0x00007f9b0403bc69 in glfs_h_stat () from /lib64/libgfapi.so.0 #6 0x00007f9b04455324 in getattrs () from /usr/lib64/ganesha/libfsalgluster.so #7 0x00007f9b08908f4a in mdcache_refresh_attrs () #8 0x00007f9b0890992a in mdcache_getattrs () #9 0x00007f9b0888a3cf in nfs_SetPostOpAttr () #10 0x00007f9b0888b933 in nfs3_fsinfo () #11 0x00007f9b08850c3c in nfs_rpc_execute () #12 0x00007f9b0885268e in worker_run () #13 0x00007f9b088e4649 in fridgethr_start_routine () #14 0x00007f9b06dbcdc5 in start_thread () from /lib64/libpthread.so.0 #15 0x00007f9b0648b6ed in clone () from /lib64/libc.so.6 (gdb) with v4 mount: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f8dbf35c700 (LWP 30822)] 0x00007f8df8432210 in pthread_spin_lock () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f8df8432210 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00007f8df53db25d in inode_ctx_get0 () from /lib64/libglusterfs.so.0 #2 0x00007f8df53db2e5 in inode_needs_lookup () from /lib64/libglusterfs.so.0 #3 0x00007f8df56ac546 in __glfs_resolve_inode () from /lib64/libgfapi.so.0 #4 0x00007f8df56ac64b in glfs_resolve_inode () from /lib64/libgfapi.so.0 #5 0x00007f8df56acc69 in glfs_h_stat () from /lib64/libgfapi.so.0 #6 0x00007f8df5ac6324 in getattrs () from /usr/lib64/ganesha/libfsalgluster.so #7 0x00007f8df9f79f4a in mdcache_refresh_attrs () #8 0x00007f8df9f7a92a in mdcache_getattrs () #9 0x00007f8df9efa787 in file_To_Fattr () #10 0x00007f8df9ed77e2 in nfs4_op_getattr () #11 0x00007f8df9ed267f in nfs4_Compound () #12 0x00007f8df9ec1c3c in nfs_rpc_execute () #13 0x00007f8df9ec368e in worker_run () #14 0x00007f8df9f55649 in fridgethr_start_routine () #15 0x00007f8df842ddc5 in start_thread () from /lib64/libpthread.so.0 #16 0x00007f8df7afc6ed in clone () from /lib64/libc.so.6 (gdb) Actual results: ganesha crashes with segfault while mounting volume with v3 or v4. Expected results: ganesha should not crash while mounting the volume Additional info:
I can reproduce this issue with following steps : 1.) Create and start a volume 2.) start nfs-ganesha 3.) Export the volume via dbus command 4.) Modify the ganesha.conf file 5.) remove and add the volume again using dbus 6.) Mount the volume The inode in the call path is invalid. Still to figure out why inode become invalid this case.
RCA : The mdcache entry created during the export of gluster volume is not cleaned up properly during unexport. So when the export is again added, it gives a entry already exist error(with respect to fsal_gluster, glfs_object corresponding to this entry is invalid) and further operation on this entry result in a crash(like mounting) On further dig up it seems to be refcount of this variable never become zero during unexport, therefore the entry is not removed from mdcache layer. For version 2.3 these entries were cleanup properly during . Similar problem is also present in fsal_vfs, but it didn't result any crash.
Jiffin, could you try with this patch: https://review.gerrithub.io/286045
(In reply to Daniel Gryniewicz from comment #3) > Jiffin, could you try with this patch: > > https://review.gerrithub.io/286045 Thanks, It worked.
*** Bug 1362621 has been marked as a duplicate of this bug. ***