Description of problem: Ganesha crashes while removing files from clients. Version-Release number of selected component (if applicable): [root@dhcp43-116 ~]# rpm -qa|grep glusterfs glusterfs-geo-replication-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-api-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-fuse-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-server-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-libs-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-client-xlators-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-ganesha-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-cli-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-debuginfo-3.8.3-0.6.git7956718.el7.centos.x86_64 glusterfs-3.8.3-0.6.git7956718.el7.centos.x86_64 [root@dhcp43-116 ~]# rpm -qa|grep ganesha nfs-ganesha-gluster-next.20160827.7641daf-1.el7.centos.x86_64 glusterfs-ganesha-3.8.3-0.6.git7956718.el7.centos.x86_64 nfs-ganesha-debuginfo-next.20160827.7641daf-1.el7.centos.x86_64 nfs-ganesha-next.20160827.7641daf-1.el7.centos.x86_64 How reproducible: Twice Steps to Reproduce: 1.Create large number of files on a dist-rep volume via v4 ganesha mount from 2 different clients. 2.Start removing different set of files simultaneously from both the clients. 3.Observe that while removal is in progress, ganesha crashes on the mounted node with below bt: (gdb) bt #0 0x00007f937af59c5f in __inode_ctx_free (inode=inode@entry=0x7f9356c9fe24) at inode.c:332 #1 0x00007f937af5ae42 in __inode_destroy (inode=0x7f9356c9fe24) at inode.c:353 #2 inode_table_prune (table=table@entry=0x7f9360103f30) at inode.c:1543 #3 0x00007f937af5b124 in inode_unref (inode=0x7f9356c9fe24) at inode.c:524 #4 0x00007f937b232216 in pub_glfs_h_close (object=0x7f925400f660) at glfs-handleops.c:1365 #5 0x00007f937b64a929 in handle_release (obj_hdl=0x7f92540308f8) at /usr/src/debug/nfs-ganesha/src/FSAL/FSAL_GLUSTER/handle.c:70 #6 0x00007f937fd1caf4 in mdcache_lru_clean (entry=0x7f924cd8f060) at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:421 #7 mdcache_lru_get (entry=entry@entry=0x7f92e94e1a70) at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1229 #8 0x00007f937fd268b6 in mdcache_alloc_handle (fs=0x0, sub_handle=0x7f924802e358, export=0x7f9380e0fab0) at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:117 #9 mdcache_new_entry (export=export@entry=0x7f9380e0fab0, ---Type <return> to continue, or q <return> to quit--- sub_handle=0x7f924802e358, attrs_in=attrs_in@entry=0x7f92e94e1bd0, attrs_out=attrs_out@entry=0x0, new_directory=new_directory@entry=false, entry=entry@entry=0x7f92e94e1b30, state=state@entry=0x0) at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:367 #10 0x00007f937fd208e4 in mdcache_alloc_and_check_handle ( export=export@entry=0x7f9380e0fab0, sub_handle=<optimized out>, new_obj=new_obj@entry=0x7f92e94e1bc8, new_directory=new_directory@entry=false, attrs_in=attrs_in@entry=0x7f92e94e1bd0, attrs_out=attrs_out@entry=0x0, tag=tag@entry=0x7f937fd58b10 "lookup ", parent=parent@entry=0x7f9380e783c0, name=name@entry=0x7f924e2d5e4c "def26337", invalidate=invalidate@entry=true, state=state@entry=0x0) at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:93 #11 0x00007f937fd27986 in mdc_lookup_uncached ( mdc_parent=mdc_parent@entry=0x7f9380e783c0, name=name@entry=0x7f924e2d5e4c "def26337", new_entry=new_entry@entry=0x7f92e94e1d40, attrs_out=attrs_out@entry=0x0) at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hel---Type <return> to continue, or q <return> to quit--- pers.c:981 #12 0x00007f937fd1f53f in mdcache_readdir (dir_hdl=0x7f9380e783f8, whence=<optimized out>, dir_state=0x7f92e94e1dc0, cb=0x7f937fc4bc00 <populate_dirent>, attrmask=<optimized out>, eod_met=0x7f92e94e1e8b) at /usr/src/debug/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:626 #13 0x00007f937fc4d97d in fsal_readdir (directory=directory@entry=0x7f9380e783f8, cookie=cookie@entry=4053976218433744092, nbfound=nbfound@entry=0x7f92e94e1e8c, eod_met=eod_met@entry=0x7f92e94e1e8b, attrmask=122830, cb=cb@entry=0x7f937fc88d50 <nfs4_readdir_callback>, opaque=opaque@entry=0x7f92e94e1e90) at /usr/src/debug/nfs-ganesha/src/FSAL/fsal_helper.c:1457 #14 0x00007f937fc89d1b in nfs4_op_readdir (op=0x7f92a40219d0, data=0x7f92e94e20b0, resp=0x7f9248025080) at /usr/src/debug/nfs-ganesha/src/Protocols/NFS/nfs4_op_readdir.c:631 #15 0x00007f937fc765bf in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f924802cb50) at /usr/src/debug/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:734 #16 0x00007f937fc65c0c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f92a4013dd0) ---Type <return> to continue, or q <return> to quit--- at /usr/src/debug/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1281 #17 0x00007f937fc674bd in worker_run (ctx=0x7f9380ecf9e0) at /usr/src/debug/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1548 #18 0x00007f937fcfb629 in fridgethr_start_routine (arg=0x7f9380ecf9e0) at /usr/src/debug/nfs-ganesha/src/support/fridgethr.c:550 #19 0x00007f937e1d2dc5 in start_thread () from /lib64/libpthread.so.0 #20 0x00007f937d8a01cd in clone () from /lib64/libc.so.6 Actual results: Ganesha crashes on one of the nodes while files are being rmeoved from 2 clients. Expected results: There should not be any crash. Additional info: There is another bug filed for the ganesha crash seen during removal of files (https://bugzilla.redhat.com/show_bug.cgi?id=1373262), but in this case bt is different, so new bug is being filed.
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html
With the private build: [root@dhcp43-116 ~]# rpm -qa|grep ganesha glusterfs-ganesha-3.8.3-0.6.git7956718.el7.centos.x86_64 nfs-ganesha-gluster-2.4-0.rc4.el7.centos.x86_64 nfs-ganesha-debuginfo-2.4-0.rc4.el7.centos.x86_64 nfs-ganesha-2.4-0.rc4.el7.centos.x86_64 ganesha crashes with Segfault with below bt while removing files from 2 mount points: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fe9d6f8d700 (LWP 8977)] 0x00007fea3ed2dc5f in __inode_ctx_free (inode=inode@entry=0x7fea1dbd022c) at inode.c:332 332 xl->cbks->forget (xl, inode); (gdb) bt #0 0x00007fea3ed2dc5f in __inode_ctx_free (inode=inode@entry=0x7fea1dbd022c) at inode.c:332 #1 0x00007fea3ed2ee42 in __inode_destroy (inode=0x7fea1dbd022c) at inode.c:353 #2 inode_table_prune (table=table@entry=0x7fea24002890) at inode.c:1543 #3 0x00007fea3ed2f124 in inode_unref (inode=0x7fea1dbd022c) at inode.c:524 #4 0x00007fea3ed1e222 in loc_wipe (loc=loc@entry=0x7fe9d6f8b210) at xlator.c:695 #5 0x00007fea3f001b4b in glfs_resolve_component (fs=fs@entry=0x1d60e90, subvol=subvol@entry=0x7fea2402aa40, parent=parent@entry=0x7fea1d77906c, component=component@entry=0x7fea100028c0 "def70703", iatt=iatt@entry=0x7fe9d6f8b3d0, force_lookup=<optimized out>) at glfs-resolve.c:368 #6 0x00007fea3f002133 in priv_glfs_resolve_at (fs=fs@entry=0x1d60e90, subvol=subvol@entry=0x7fea2402aa40, at=at@entry=0x7fea1d77906c, origpath=origpath@entry=0x7fe91e53bccc "def70703", loc=loc@entry=0x7fe9d6f8b4d0, iatt=iatt@entry=0x7fe9d6f8b510, follow=follow@entry=0, reval=reval@entry=0) at glfs-resolve.c:417 #7 0x00007fea3f003a78 in pub_glfs_h_lookupat (fs=0x1d60e90, parent=<optimized out>, path=0x7fe91e53bccc "def70703", stat=0x7fe9d6f8b630, follow=0) at glfs-handleops.c:102 #8 0x00007fea3f41e02c in lookup (parent=0x1e32e28, ---Type <return> to continue, or q <return> to quit--- path=0x7fe91e53bccc "def70703", handle=0x7fe9d6f8b840, attrs_out=0x7fe9d6f8b760) at /usr/src/debug/nfs-ganesha-2.4-rc4-0.1.1-Source/FSAL/FSAL_GLUSTER/handle.c:112 #9 0x0000000000537358 in mdc_lookup_uncached (mdc_parent=0x1e2f340, name=0x7fe91e53bccc "def70703", new_entry=0x7fe9d6f8b8d8, attrs_out=0x0) at /usr/src/debug/nfs-ganesha-2.4-rc4-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:968 #10 0x000000000052dd26 in mdcache_readdir (dir_hdl=0x1e2f378, whence=0x7fe9d6f8b970, dir_state=0x7fe9d6f8b980, cb=0x43184b <populate_dirent>, attrmask=0, eod_met=0x7fe9d6f8be7b) at /usr/src/debug/nfs-ganesha-2.4-rc4-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:626 #11 0x00000000004320b9 in fsal_readdir (directory=0x1e2f378, cookie=52889545390524074, nbfound=0x7fe9d6f8be7c, eod_met=0x7fe9d6f8be7b, attrmask=0, cb=0x48ff13 <nfs3_readdir_callback>, opaque=0x7fe9d6f8be30) at /usr/src/debug/nfs-ganesha-2.4-rc4-0.1.1-Source/FSAL/fsal_helper.c:1457 #12 0x000000000048fcfa in nfs3_readdir (arg=0x7fe978001468, req=0x7fe9780012a8, res=0x7fea10002dd0) at /usr/src/debug/nfs-ganesha-2.4-rc4-0.1.1-Source/Protocols/NFS/nfs3_readdir.c---Type <return> to continue, or q <return> to quit--- :295 #13 0x000000000044ad6b in nfs_rpc_execute (reqdata=0x7fe978001280) at /usr/src/debug/nfs-ganesha-2.4-rc4-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1281 #14 0x000000000044b625 in worker_run (ctx=0x1e7d560) at /usr/src/debug/nfs-ganesha-2.4-rc4-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1548 #15 0x000000000050079f in fridgethr_start_routine (arg=0x1e7d560) at /usr/src/debug/nfs-ganesha-2.4-rc4-0.1.1-Source/support/fridgethr.c:550 #16 0x00007fea41d9edc5 in start_thread () from /lib64/libpthread.so.0 #17 0x00007fea4145e1cd in clone () from /lib64/libc.so.6 (gdb)
This looks a bit similar (at least top bt) to the issue reported in bug1353561. The bug seems to be in gluster sources but not related to nfs-ganesha.
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.