Description of problem: 6 Node ganesha cluster. 3 clients mapping same volume (2 x (4 + 2) Distributed-disperse Volume ) with v3/v4 protocol. Different VIP's. While running bonnie,linux untars with parallel lookups from 3 different clients,Ganesha crashed on one of the node (whose VIP is mapped to client running lookups) ==================== Switching to Thread 0x7f40e7fa7700 (LWP 22611)] 0x00007f4148abc207 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007f4148abc207 in raise () from /lib64/libc.so.6 #1 0x00007f4148abd8f8 in abort () from /lib64/libc.so.6 #2 0x00005594dda3f8f3 in mdcache_alloc_and_check_handle (export=export@entry=0x5594de1294d0, sub_handle=<optimized out>, new_obj=new_obj@entry=0x7f40e7fa5938, new_directory=new_directory@entry=false, attrs_in=attrs_in@entry=0x7f40e7fa5940, attrs_out=attrs_out@entry=0x0, tag=tag@entry=0x5594dda8d9a1 "lookup ", parent=parent@entry=0x7f4080101220, name=name@entry=0x7f3f6c2d41b4 "", invalidate=invalidate@entry=0x7f40e7fa592f, state=state@entry=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:138 #3 0x00005594dda4b0a1 in mdc_lookup_uncached (mdc_parent=mdc_parent@entry=0x7f4080101220, name=0x7f3f6c2d41b4 "", new_entry=new_entry@entry=0x7f40e7fa5b18, attrs_out=attrs_out@entry=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1420 #4 0x00005594dda4f772 in mdcache_readdir_chunked (directory=directory@entry=0x7f4080101220, whence=0, dir_state=dir_state@entry=0x7f40e7fa5e30, cb=cb@entry=0x5594dd96a1f0 <populate_dirent>, attrmask=attrmask@entry=122830, eod_met=eod_met@entry=0x7f40e7fa5f1b) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:3215 #5 0x00005594dda3d924 in mdcache_readdir (dir_hdl=0x7f4080101258, whence=<optimized out>, dir_state=0x7f40e7fa5e30, cb=0x5594dd96a1f0 <populate_dirent>, attrmask=122830, eod_met=0x7f40e7fa5f1b) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:640 #6 0x00005594dd96c0e4 in fsal_readdir (directory=directory@entry=0x7f4080101258, cookie=cookie@entry=0, nbfound=nbfound@entry=0x7f40e7fa5f1c, eod_met=eod_met@entry=0x7f40e7fa5f1b, attrmask=122830, cb=cb@entry=0x5594dd9a87f0 <nfs4_readdir_callback>, opaque=opaque@entry=0x7f40e7fa5f20) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/fsal_helper.c:1500 ---Type <return> to continue, or q <return> to quit--- #7 0x00005594dd9a97bb in nfs4_op_readdir (op=0x7f40880043c0, data=0x7f40e7fa6150, resp=0x7f3f44362eb0) at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_op_readdir.c:627 #8 0x00005594dd99515f in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f3f442f91f0) at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_Compound.c:752 #9 0x00005594dd9853cb in nfs_rpc_execute (reqdata=reqdata@entry=0x7f4088059470) at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1290 #10 0x00005594dd986a2a in worker_run (ctx=0x5594de23b5e0) at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1562 #11 0x00005594dda171a9 in fridgethr_start_routine (arg=0x5594de23b5e0) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:550 #12 0x00007f41494b8dd5 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f4148b84b3d in clone () from /lib64/libc.so.6 ============== ganesha.log- ----------- 15/07/2018 18:17:07 : epoch 96c40000 : zod.lab.eng.blr.redhat.com : ganesha.nfsd-22492[work-95] posix2fsal_type :FSAL :WARN :Unknown object type: 0 15/07/2018 18:17:07 : epoch 96c40000 : zod.lab.eng.blr.redhat.com : ganesha.nfsd-22492[work-95] posix2fsal_type :FSAL :WARN :Unknown object type: 0 15/07/2018 18:17:07 : epoch 96c40000 : zod.lab.eng.blr.redhat.com : ganesha.nfsd-22492[work-95] mdcache_alloc_and_check_handle :RW LOCK :CRIT :Error 35, write locking 0x7f4080101658 (&new_entry->content_lock) at /builddir/build/BUILD/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:138 ----------- Version-Release number of selected component (if applicable): # rpm -qa | grep ganesha nfs-ganesha-2.5.5-8.el7rhgs.x86_64 nfs-ganesha-gluster-2.5.5-8.el7rhgs.x86_64 nfs-ganesha-debuginfo-2.5.5-8.el7rhgs.x86_64 glusterfs-ganesha-3.12.2-13.el7rhgs.x86_64 How reproducible: 2/3 Steps to Reproduce: 1.Create 6 node ganesha cluster 2.Create Distributed-disperse 2 x (4 + 2) volume 3.Mount the volume to 3 different clients with 3 different VIP's 4.Run the following workload Client 1 (v3) :Run linux untars Client 2 (v3):Run dbench,bonnie Client 3 (v4):Run ls -laRt in loop Actual results: While running the above workload,Ganesha crashed on one of the node. Expected results: Ganesha should not crash Additional info:
So, the lock attempt returned EDEADLK, which means that this thread already has the lock. The content lock of the parent directory is, indeed, held during this operation; however, you shouldn't be able to get an inode that points to a directory when you do a readdir() on that directory. You can't, for example, hard-link to a directory at all. Does the directory structure still exist? If so, can we get the output of ls -ialR from it? (note, -i, not -t) Alternatively, if the core is still there, we can get the name of the directory that crashed from that, and the just get the "ls -ial" output of that directory; this would be a much much smaller output.
I don't know how path became "" yet, name is only set on creation or rename. Tracing through the code, it looks like only NFSv3 could use an empty name, since NFSv4's standard utf8 string handler checks for 0-length strings, whereas NFSv3 just uses what the client provided. However, returning the parent for that is clearly a bug. Parent should only be returned for "."
(In reply to Daniel Gryniewicz from comment #11) > I don't know how path became "" yet, name is only set on creation or rename. > Tracing through the code, it looks like only NFSv3 could use an empty name, > since NFSv4's standard utf8 string handler checks for 0-length strings, > whereas NFSv3 just uses what the client provided. > > > However, returning the parent for that is clearly a bug. Parent should only > be returned for "." Hmm, I thought I had done some looking at this one... An empty name would be valid if AT_EMPTY_PATH was set, but we don't in this case. Is it remotely possible we got an empty name in readdir? If this is re-createable, it would be interesting to enable NFS_READDIR log componment to FULL_DEBUG. We can then look for empty names.
(In reply to Frank Filz from comment #12) > (In reply to Daniel Gryniewicz from comment #11) > > I don't know how path became "" yet, name is only set on creation or rename. > > Tracing through the code, it looks like only NFSv3 could use an empty name, > > since NFSv4's standard utf8 string handler checks for 0-length strings, > > whereas NFSv3 just uses what the client provided. > > > > > > However, returning the parent for that is clearly a bug. Parent should only > > be returned for "." > > Hmm, I thought I had done some looking at this one... > > An empty name would be valid if AT_EMPTY_PATH was set, but we don't in this > case. > > Is it remotely possible we got an empty name in readdir? > > If this is re-createable, it would be interesting to enable NFS_READDIR log > componment to FULL_DEBUG. We can then look for empty names. I don't know but isn't Daniel's observation about returning parent (dot? dotdot?) still correct? Matt
(In reply to Matt Benjamin (redhat) from comment #13) > I don't know but isn't Daniel's observation about returning parent (dot? > dotdot?) still correct? Yes, the empty path resulting in returning the parent without AT_EMPTY_PATH passed as a flag is not good, without that flag, empty path should return an error. But there's an issue where somehow Ganesha is getting a dirent with an empty name... That COULD be because we got one from readdir from the filesystem. It could be because somehow we dropped the name.
We can't have dropped the name. It's only freed and NULL'd. I think it has to have come from either readdir() or rename().
AT_EMPTY_PATH is only valid for *at() calls (fstatat, fchonwat, etc), i.e. things that take a file descriptor and a name. As far as I can tell, an empty name on a dirent is not valid. Adding a check for creation, link, and rename is easy; dealing with readdir is much harder, as it may make an entire directory unreadable, and therefor un-removeable. It would be better for this case if Gluster disallowed creation of such a dirent in the first place.
Actually, NFSv4 explicitly forbids zero-length dirents: If the oldname or newname is of zero length, NFS4ERR_INVAL will be returned. NFSv3 does not include this requirement, but does allow NFS3ERR_INVAL for invalid names. I'll add some checking for names from clients. NFSv4 already has this, as part of it's standard UTF-8 handling, but I'll add some for other protocols. GFAPI needs to be fixed in addition, so no other client can create such dirents.
https://review.gerrithub.io/#/c/ffilz/nfs-ganesha/+/420770
Since there's a patch required from gluster layer, moving this BZ to POST.
Verified this with (Readdir disable in ganesha.conf) # rpm -qa | grep ganesha nfs-ganesha-gluster-2.5.5-10.el7rhgs.x86_64 nfs-ganesha-debuginfo-2.5.5-10.el7rhgs.x86_64 nfs-ganesha-2.5.5-10.el7rhgs.x86_64 glusterfs-ganesha-3.12.2-16.el7rhgs.x86_64 Steps performed for verification- 1.Create 6 node ganesha cluster 2.Create Distributed-disperse 6 x (4 + 2) volume 3.Mount the volume to 3 different clients with 3 different VIP's 4.Run the following workload Client 1 (v3) :Run linux untars Client 2 (v3):Run dbench,bonnie Client 3 (v4):Run ls -laRt in loop No crashes were been observed while performing the above steps.Moving this BZ to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607