Bug 1601245 - [Ganesha] Ganesha crashed in mdcache_alloc_and_check_handle while running bonnie and untars with parallel lookups
Summary: [Ganesha] Ganesha crashed in mdcache_alloc_and_check_handle while running bon...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nfs-ganesha
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.4.0
Assignee: Kaleb KEITHLEY
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On: 1610236 1618347 1618348
Blocks: 1503137
TreeView+ depends on / blocked
 
Reported: 2018-07-15 13:35 UTC by Manisha Saini
Modified: 2018-09-24 11:30 UTC (History)
14 users (show)

Fixed In Version: glusterfs-3.12.2-16
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1610236 (view as bug list)
Environment:
Last Closed: 2018-09-04 06:50:24 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 0 None None None 2018-09-04 06:51:58 UTC

Description Manisha Saini 2018-07-15 13:35:44 UTC
Description of problem:

6 Node ganesha cluster. 3 clients mapping same volume (2 x (4 + 2) Distributed-disperse Volume ) with v3/v4 protocol. Different VIP's.


While running bonnie,linux untars with parallel lookups from 3 different clients,Ganesha crashed on one of the node (whose VIP is mapped to client running lookups)


====================

Switching to Thread 0x7f40e7fa7700 (LWP 22611)]
0x00007f4148abc207 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f4148abc207 in raise () from /lib64/libc.so.6
#1  0x00007f4148abd8f8 in abort () from /lib64/libc.so.6
#2  0x00005594dda3f8f3 in mdcache_alloc_and_check_handle (export=export@entry=0x5594de1294d0, sub_handle=<optimized out>, 
    new_obj=new_obj@entry=0x7f40e7fa5938, new_directory=new_directory@entry=false, attrs_in=attrs_in@entry=0x7f40e7fa5940, 
    attrs_out=attrs_out@entry=0x0, tag=tag@entry=0x5594dda8d9a1 "lookup ", parent=parent@entry=0x7f4080101220, name=name@entry=0x7f3f6c2d41b4 "", 
    invalidate=invalidate@entry=0x7f40e7fa592f, state=state@entry=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:138
#3  0x00005594dda4b0a1 in mdc_lookup_uncached (mdc_parent=mdc_parent@entry=0x7f4080101220, name=0x7f3f6c2d41b4 "", 
    new_entry=new_entry@entry=0x7f40e7fa5b18, attrs_out=attrs_out@entry=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1420
#4  0x00005594dda4f772 in mdcache_readdir_chunked (directory=directory@entry=0x7f4080101220, whence=0, dir_state=dir_state@entry=0x7f40e7fa5e30, 
    cb=cb@entry=0x5594dd96a1f0 <populate_dirent>, attrmask=attrmask@entry=122830, eod_met=eod_met@entry=0x7f40e7fa5f1b)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:3215
#5  0x00005594dda3d924 in mdcache_readdir (dir_hdl=0x7f4080101258, whence=<optimized out>, dir_state=0x7f40e7fa5e30, 
    cb=0x5594dd96a1f0 <populate_dirent>, attrmask=122830, eod_met=0x7f40e7fa5f1b)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:640
#6  0x00005594dd96c0e4 in fsal_readdir (directory=directory@entry=0x7f4080101258, cookie=cookie@entry=0, nbfound=nbfound@entry=0x7f40e7fa5f1c, 
    eod_met=eod_met@entry=0x7f40e7fa5f1b, attrmask=122830, cb=cb@entry=0x5594dd9a87f0 <nfs4_readdir_callback>, opaque=opaque@entry=0x7f40e7fa5f20)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/fsal_helper.c:1500
---Type <return> to continue, or q <return> to quit---
#7  0x00005594dd9a97bb in nfs4_op_readdir (op=0x7f40880043c0, data=0x7f40e7fa6150, resp=0x7f3f44362eb0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_op_readdir.c:627
#8  0x00005594dd99515f in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f3f442f91f0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_Compound.c:752
#9  0x00005594dd9853cb in nfs_rpc_execute (reqdata=reqdata@entry=0x7f4088059470)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1290
#10 0x00005594dd986a2a in worker_run (ctx=0x5594de23b5e0) at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1562
#11 0x00005594dda171a9 in fridgethr_start_routine (arg=0x5594de23b5e0) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:550
#12 0x00007f41494b8dd5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f4148b84b3d in clone () from /lib64/libc.so.6

============== 



ganesha.log-

-----------
15/07/2018 18:17:07 : epoch 96c40000 : zod.lab.eng.blr.redhat.com : ganesha.nfsd-22492[work-95] posix2fsal_type :FSAL :WARN :Unknown object type: 0
15/07/2018 18:17:07 : epoch 96c40000 : zod.lab.eng.blr.redhat.com : ganesha.nfsd-22492[work-95] posix2fsal_type :FSAL :WARN :Unknown object type: 0
15/07/2018 18:17:07 : epoch 96c40000 : zod.lab.eng.blr.redhat.com : ganesha.nfsd-22492[work-95] mdcache_alloc_and_check_handle :RW LOCK :CRIT :Error 35, write locking 0x7f4080101658 (&new_entry->content_lock) at /builddir/build/BUILD/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:138
-----------

Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha
nfs-ganesha-2.5.5-8.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-8.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-8.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-13.el7rhgs.x86_64


How reproducible:

2/3

Steps to Reproduce:
1.Create 6 node ganesha cluster
2.Create Distributed-disperse 2 x (4 + 2) volume
3.Mount the volume to 3 different clients with 3 different VIP's
4.Run the following workload

Client 1 (v3) :Run linux untars
Client 2 (v3):Run dbench,bonnie
Client 3 (v4):Run  ls -laRt in loop

Actual results:

While running the above workload,Ganesha crashed on one of the node.


Expected results:

Ganesha should not crash 

Additional info:

Comment 4 Daniel Gryniewicz 2018-07-16 18:45:59 UTC
So, the lock attempt returned EDEADLK, which means that this thread already has the lock.  The content lock of the parent directory is, indeed, held during this operation; however, you shouldn't be able to get an inode that points to a directory when you do a readdir() on that directory.  You can't, for example, hard-link to a directory at all.

Does the directory structure still exist?  If so, can we get the output of
ls -ialR
from it? (note, -i, not -t)  Alternatively, if the core is still there, we can get the name of the directory that crashed from that, and the just get the "ls -ial" output of that directory; this would be a much much smaller output.

Comment 11 Daniel Gryniewicz 2018-07-27 14:32:42 UTC
I don't know how path became "" yet, name is only set on creation or rename.  Tracing through the code, it looks like only NFSv3 could use an empty name, since NFSv4's standard utf8 string handler checks for 0-length strings, whereas NFSv3 just uses what the client provided.


However, returning the parent for that is clearly a bug.  Parent should only be returned for "."

Comment 12 Frank Filz 2018-07-27 14:56:43 UTC
(In reply to Daniel Gryniewicz from comment #11)
> I don't know how path became "" yet, name is only set on creation or rename.
> Tracing through the code, it looks like only NFSv3 could use an empty name,
> since NFSv4's standard utf8 string handler checks for 0-length strings,
> whereas NFSv3 just uses what the client provided.
> 
> 
> However, returning the parent for that is clearly a bug.  Parent should only
> be returned for "."

Hmm, I thought I had done some looking at this one... 

An empty name would be valid if AT_EMPTY_PATH was set, but we don't in this case.

Is it remotely possible we got an empty name in readdir?

If this is re-createable, it would be interesting to enable NFS_READDIR log componment to FULL_DEBUG. We can then look for empty names.

Comment 13 Matt Benjamin (redhat) 2018-07-27 15:00:57 UTC
(In reply to Frank Filz from comment #12)
> (In reply to Daniel Gryniewicz from comment #11)
> > I don't know how path became "" yet, name is only set on creation or rename.
> > Tracing through the code, it looks like only NFSv3 could use an empty name,
> > since NFSv4's standard utf8 string handler checks for 0-length strings,
> > whereas NFSv3 just uses what the client provided.
> > 
> > 
> > However, returning the parent for that is clearly a bug.  Parent should only
> > be returned for "."
> 
> Hmm, I thought I had done some looking at this one... 
> 
> An empty name would be valid if AT_EMPTY_PATH was set, but we don't in this
> case.
> 
> Is it remotely possible we got an empty name in readdir?
> 
> If this is re-createable, it would be interesting to enable NFS_READDIR log
> componment to FULL_DEBUG. We can then look for empty names.

I don't know but isn't Daniel's observation about returning parent (dot?  dotdot?) still correct?

Matt

Comment 14 Frank Filz 2018-07-27 15:29:42 UTC
(In reply to Matt Benjamin (redhat) from comment #13)
> I don't know but isn't Daniel's observation about returning parent (dot? 
> dotdot?) still correct?

Yes, the empty path resulting in returning the parent without AT_EMPTY_PATH passed as a flag is not good, without that flag, empty path should return an error.

But there's an issue where somehow Ganesha is getting a dirent with an empty name...

That COULD be because we got one from readdir from the filesystem. It could be because somehow we dropped the name.

Comment 15 Daniel Gryniewicz 2018-07-27 16:04:37 UTC
We can't have dropped the name.  It's only freed and NULL'd.  I think it has to have come from either readdir() or rename().

Comment 16 Daniel Gryniewicz 2018-07-30 13:57:13 UTC
AT_EMPTY_PATH is only valid for *at() calls (fstatat, fchonwat, etc), i.e. things that take a file descriptor and a name.  As far as I can tell, an empty name on a dirent is not valid.  Adding a check for creation, link, and rename is easy; dealing with readdir is much harder, as it may make an entire directory unreadable, and therefor un-removeable.  It would be better for this case if Gluster disallowed creation of such a dirent in the first place.

Comment 17 Daniel Gryniewicz 2018-07-30 14:03:00 UTC
Actually, NFSv4 explicitly forbids zero-length dirents:

   If the oldname or newname is of zero length, NFS4ERR_INVAL will be
   returned.

NFSv3 does not include this requirement, but does allow NFS3ERR_INVAL for invalid names.  I'll add some checking for names from clients.  NFSv4 already has this, as part of it's standard UTF-8 handling, but I'll add some for other protocols.

GFAPI needs to be fixed in addition, so no other client can create such dirents.

Comment 21 Atin Mukherjee 2018-08-01 09:14:41 UTC
Since there's a patch required from gluster layer, moving this BZ to POST.

Comment 25 Manisha Saini 2018-08-22 09:25:02 UTC
Verified this with (Readdir disable in ganesha.conf)

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.5.5-10.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-10.el7rhgs.x86_64
nfs-ganesha-2.5.5-10.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-16.el7rhgs.x86_64


Steps performed for verification-

1.Create 6 node ganesha cluster
2.Create Distributed-disperse 6 x (4 + 2) volume
3.Mount the volume to 3 different clients with 3 different VIP's
4.Run the following workload

Client 1 (v3) :Run linux untars
Client 2 (v3):Run dbench,bonnie
Client 3 (v4):Run  ls -laRt in loop

No crashes were been observed while performing the above steps.Moving this BZ to verified state.

Comment 26 errata-xmlrpc 2018-09-04 06:50:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607


Note You need to log in before you can comment on or make changes to this bug.