Bug 1610236 - [Ganesha] Ganesha crashed in mdcache_alloc_and_check_handle while running bonnie and untars with parallel lookups
Summary: [Ganesha] Ganesha crashed in mdcache_alloc_and_check_handle while running bon...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: libgfapi
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Jiffin
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks: 1601245 1618347 1618348
TreeView+ depends on / blocked
 
Reported: 2018-07-31 09:26 UTC by Jiffin
Modified: 2018-10-23 15:15 UTC (History)
15 users (show)

Fixed In Version: glusterfs-5.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1601245
: 1618347 1618348 (view as bug list)
Environment:
Last Closed: 2018-10-23 14:19:05 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Jiffin 2018-07-31 09:26:24 UTC
+++ This bug was initially created as a clone of Bug #1601245 +++

Description of problem:

6 Node ganesha cluster. 3 clients mapping same volume (2 x (4 + 2) Distributed-disperse Volume ) with v3/v4 protocol. Different VIP's.


While running bonnie,linux untars with parallel lookups from 3 different clients,Ganesha crashed on one of the node (whose VIP is mapped to client running lookups)


====================

Switching to Thread 0x7f40e7fa7700 (LWP 22611)]
0x00007f4148abc207 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f4148abc207 in raise () from /lib64/libc.so.6
#1  0x00007f4148abd8f8 in abort () from /lib64/libc.so.6
#2  0x00005594dda3f8f3 in mdcache_alloc_and_check_handle (export=export@entry=0x5594de1294d0, sub_handle=<optimized out>, 
    new_obj=new_obj@entry=0x7f40e7fa5938, new_directory=new_directory@entry=false, attrs_in=attrs_in@entry=0x7f40e7fa5940, 
    attrs_out=attrs_out@entry=0x0, tag=tag@entry=0x5594dda8d9a1 "lookup ", parent=parent@entry=0x7f4080101220, name=name@entry=0x7f3f6c2d41b4 "", 
    invalidate=invalidate@entry=0x7f40e7fa592f, state=state@entry=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:138
#3  0x00005594dda4b0a1 in mdc_lookup_uncached (mdc_parent=mdc_parent@entry=0x7f4080101220, name=0x7f3f6c2d41b4 "", 
    new_entry=new_entry@entry=0x7f40e7fa5b18, attrs_out=attrs_out@entry=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1420
#4  0x00005594dda4f772 in mdcache_readdir_chunked (directory=directory@entry=0x7f4080101220, whence=0, dir_state=dir_state@entry=0x7f40e7fa5e30, 
    cb=cb@entry=0x5594dd96a1f0 <populate_dirent>, attrmask=attrmask@entry=122830, eod_met=eod_met@entry=0x7f40e7fa5f1b)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:3215
#5  0x00005594dda3d924 in mdcache_readdir (dir_hdl=0x7f4080101258, whence=<optimized out>, dir_state=0x7f40e7fa5e30, 
    cb=0x5594dd96a1f0 <populate_dirent>, attrmask=122830, eod_met=0x7f40e7fa5f1b)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:640
#6  0x00005594dd96c0e4 in fsal_readdir (directory=directory@entry=0x7f4080101258, cookie=cookie@entry=0, nbfound=nbfound@entry=0x7f40e7fa5f1c, 
    eod_met=eod_met@entry=0x7f40e7fa5f1b, attrmask=122830, cb=cb@entry=0x5594dd9a87f0 <nfs4_readdir_callback>, opaque=opaque@entry=0x7f40e7fa5f20)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/fsal_helper.c:1500
---Type <return> to continue, or q <return> to quit---
#7  0x00005594dd9a97bb in nfs4_op_readdir (op=0x7f40880043c0, data=0x7f40e7fa6150, resp=0x7f3f44362eb0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_op_readdir.c:627
#8  0x00005594dd99515f in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f3f442f91f0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_Compound.c:752
#9  0x00005594dd9853cb in nfs_rpc_execute (reqdata=reqdata@entry=0x7f4088059470)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1290
#10 0x00005594dd986a2a in worker_run (ctx=0x5594de23b5e0) at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1562
#11 0x00005594dda171a9 in fridgethr_start_routine (arg=0x5594de23b5e0) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:550
#12 0x00007f41494b8dd5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f4148b84b3d in clone () from /lib64/libc.so.6

============== 



ganesha.log-

-----------
15/07/2018 18:17:07 : epoch 96c40000 : zod.lab.eng.blr.redhat.com : ganesha.nfsd-22492[work-95] posix2fsal_type :FSAL :WARN :Unknown object type: 0
15/07/2018 18:17:07 : epoch 96c40000 : zod.lab.eng.blr.redhat.com : ganesha.nfsd-22492[work-95] posix2fsal_type :FSAL :WARN :Unknown object type: 0
15/07/2018 18:17:07 : epoch 96c40000 : zod.lab.eng.blr.redhat.com : ganesha.nfsd-22492[work-95] mdcache_alloc_and_check_handle :RW LOCK :CRIT :Error 35, write locking 0x7f4080101658 (&new_entry->content_lock) at /builddir/build/BUILD/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:138
-----------

Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha
nfs-ganesha-2.5.5-8.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-8.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-8.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-13.el7rhgs.x86_64


How reproducible:

2/3

Steps to Reproduce:
1.Create 6 node ganesha cluster
2.Create Distributed-disperse 2 x (4 + 2) volume
3.Mount the volume to 3 different clients with 3 different VIP's
4.Run the following workload

Client 1 (v3) :Run linux untars
Client 2 (v3):Run dbench,bonnie
Client 3 (v4):Run  ls -laRt in loop

Actual results:

While running the above workload,Ganesha crashed on one of the node.


Expected results:

Ganesha should not crash 

Additional info:



--- Additional comment from Daniel Gryniewicz on 2018-07-16 14:45:59 EDT ---

So, the lock attempt returned EDEADLK, which means that this thread already has the lock.  The content lock of the parent directory is, indeed, held during this operation; however, you shouldn't be able to get an inode that points to a directory when you do a readdir() on that directory.  You can't, for example, hard-link to a directory at all.

Does the directory structure still exist?  If so, can we get the output of
ls -ialR
from it? (note, -i, not -t)  Alternatively, if the core is still there, we can get the name of the directory that crashed from that, and the just get the "ls -ial" output of that directory; this would be a much much smaller output.

--- Additional comment from Manisha Saini on 2018-07-18 05:15:16 EDT ---

I have shared the core generated in comment#2.
Setup is not in same state.

--- Additional comment from Manisha Saini on 2018-07-23 04:00:16 EDT ---

I could consistently hit this crash on nfs-ganesha-gluster-2.5.5-8.el7rhgs.x86_64 build.

But every time its not able to generate core dump file.Hence I took from gdb session.

Also,there is no dir name in the core file causing this crash so that I can take the output of "ls -ial" of that particular directory.

Taking full output of "ls -ialR" is taking more then 2+ hours for this data set.


bt output from gdb session
--------------------
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fde98718700 (LWP 31200)]
0x00007fdf5a8ad207 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fdf5a8ad207 in raise () from /lib64/libc.so.6
#1  0x00007fdf5a8ae8f8 in abort () from /lib64/libc.so.6
#2  0x0000562c4455e8f3 in mdcache_alloc_and_check_handle (export=export@entry=0x562c45f5f100, sub_handle=<optimized out>, 
    new_obj=new_obj@entry=0x7fde98716938, new_directory=new_directory@entry=false, attrs_in=attrs_in@entry=0x7fde98716940, 
    attrs_out=attrs_out@entry=0x0, tag=tag@entry=0x562c445ac9a1 "lookup ", parent=parent@entry=0x7fdd7c35a1c0, 
    name=name@entry=0x7fde3c30ff24 "", invalidate=invalidate@entry=0x7fde9871692f, state=state@entry=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:138
#3  0x0000562c4456a0a1 in mdc_lookup_uncached (mdc_parent=mdc_parent@entry=0x7fdd7c35a1c0, name=0x7fde3c30ff24 "", 
    new_entry=new_entry@entry=0x7fde98716b18, attrs_out=attrs_out@entry=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1420
#4  0x0000562c4456e772 in mdcache_readdir_chunked (directory=directory@entry=0x7fdd7c35a1c0, whence=0, 
    dir_state=dir_state@entry=0x7fde98716e30, cb=cb@entry=0x562c444891f0 <populate_dirent>, attrmask=attrmask@entry=122830, 
    eod_met=eod_met@entry=0x7fde98716f1b) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:3215
#5  0x0000562c4455c924 in mdcache_readdir (dir_hdl=0x7fdd7c35a1f8, whence=<optimized out>, dir_state=0x7fde98716e30, 
    cb=0x562c444891f0 <populate_dirent>, attrmask=122830, eod_met=0x7fde98716f1b)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:640
#6  0x0000562c4448b0e4 in fsal_readdir (directory=directory@entry=0x7fdd7c35a1f8, cookie=cookie@entry=0, 
    nbfound=nbfound@entry=0x7fde98716f1c, eod_met=eod_met@entry=0x7fde98716f1b, attrmask=122830, 
    cb=cb@entry=0x562c444c77f0 <nfs4_readdir_callback>, opaque=opaque@entry=0x7fde98716f20)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/fsal_helper.c:1500
#7  0x0000562c444c87bb in nfs4_op_readdir (op=0x7fde240bf870, data=0x7fde98717150, resp=0x7fde3843f7b0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_op_readdir.c:627
#8  0x0000562c444b415f in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7fde3837b660)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_Compound.c:752
#9  0x0000562c444a43cb in nfs_rpc_execute (reqdata=reqdata@entry=0x7fde2435f370)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1290


--------------------------


bt of core dump file

------------------
4 libgcrypt-1.5.3-14.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libnfsidmap-0.25-19.el7.x86_64 libselinux-2.5-12.el7.x86_64 libuuid-2.23.2-52.el7.x86_64 lz4-1.7.5-2.el7.x86_64 openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64 systemd-libs-219-57.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x00007fdf5b2adcf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000562c44536404 in fridgethr_freeze (thr_ctx=0x562c45f59040, fr=0x562c45f583a0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416
#2  fridgethr_start_routine (arg=0x562c45f59040) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554
#3  0x00007fdf5b2a9dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fdf5a975b3d in clone () from /lib64/libc.so.6
--------------------


--- Additional comment from Daniel Gryniewicz on 2018-07-23 15:05:51 EDT ---

Okay, the core file helps, to some extent.  There is a dirent with an empty name:

(gdb) p dirent->name 
$2 = 0x7fde3c30ff24 ""

This is passed into FSAL_GLUSTER's lookup(), which calls glfs_h_lookupat() with it.  This must return the handle of the parent directory, rather than an error, and so mdcache find's it's existing entry for the parent dir, which already has it's content_lock held.

There are 2 bugs here, I think.

1) MDCACHE lost a name on a dirent.  I'll look into this.
2) lookup(parent_dir, name="") in gluster returns parent_dir, instead of an error.

--- Additional comment from Jiffin on 2018-07-26 05:03:02 EDT ---

(In reply to Daniel Gryniewicz from comment #9)
> Okay, the core file helps, to some extent.  There is a dirent with an empty
> name:
> 
> (gdb) p dirent->name 
> $2 = 0x7fde3c30ff24 ""
> 
> This is passed into FSAL_GLUSTER's lookup(), which calls glfs_h_lookupat()
> with it.  This must return the handle of the parent directory, rather than
> an error, and so mdcache find's it's existing entry for the parent dir,
> which already has it's content_lock held.
> 
> There are 2 bugs here, I think.
> 
> 1) MDCACHE lost a name on a dirent.  I'll look into this.
> 2) lookup(parent_dir, name="") in gluster returns parent_dir, instead of an
> error.

The pub_glfs_h_lookupat() only checks for path = NULL for returning error, in this it was not NULL. Then it tries to resolve path component 
from glfs_resolve_at()


                                                                                                         
        for (component = strtok_r (path, "/", &saveptr);                                                 
             component; component = next_component) {                                                    
                                                                                                         
                next_component = strtok_r (NULL, "/", &saveptr);                                         
                                                                                                         
                if (parent)                                                                              
                        inode_unref (parent);                                                            
                parent = inode;                                                                          
                inode = glfs_resolve_component (fs, subvol, parent,                                      
                                                component, &ciatt,                                       
                                                /* force hard lookup on the last                         
                                                   component, as the caller                              
                                                   wants proper iatt filled                              
                                                */                                                       
                                                (reval || (!next_component &&                            
                                                iatt)));   

Looks like above loop resolve parent if the value of path is "".


So how come the path become "". Was it cleared or something ? is it valid value for path??

--- Additional comment from Daniel Gryniewicz on 2018-07-27 10:32:42 EDT ---

I don't know how path became "" yet, name is only set on creation or rename.  Tracing through the code, it looks like only NFSv3 could use an empty name, since NFSv4's standard utf8 string handler checks for 0-length strings, whereas NFSv3 just uses what the client provided.


However, returning the parent for that is clearly a bug.  Parent should only be returned for "."

--- Additional comment from Frank Filz on 2018-07-27 10:56:43 EDT ---

(In reply to Daniel Gryniewicz from comment #11)
> I don't know how path became "" yet, name is only set on creation or rename.
> Tracing through the code, it looks like only NFSv3 could use an empty name,
> since NFSv4's standard utf8 string handler checks for 0-length strings,
> whereas NFSv3 just uses what the client provided.
> 
> 
> However, returning the parent for that is clearly a bug.  Parent should only
> be returned for "."

Hmm, I thought I had done some looking at this one... 

An empty name would be valid if AT_EMPTY_PATH was set, but we don't in this case.

Is it remotely possible we got an empty name in readdir?

If this is re-createable, it would be interesting to enable NFS_READDIR log componment to FULL_DEBUG. We can then look for empty names.

--- Additional comment from Matt Benjamin (redhat) on 2018-07-27 11:00:57 EDT ---

(In reply to Frank Filz from comment #12)
> (In reply to Daniel Gryniewicz from comment #11)
> > I don't know how path became "" yet, name is only set on creation or rename.
> > Tracing through the code, it looks like only NFSv3 could use an empty name,
> > since NFSv4's standard utf8 string handler checks for 0-length strings,
> > whereas NFSv3 just uses what the client provided.
> > 
> > 
> > However, returning the parent for that is clearly a bug.  Parent should only
> > be returned for "."
> 
> Hmm, I thought I had done some looking at this one... 
> 
> An empty name would be valid if AT_EMPTY_PATH was set, but we don't in this
> case.
> 
> Is it remotely possible we got an empty name in readdir?
> 
> If this is re-createable, it would be interesting to enable NFS_READDIR log
> componment to FULL_DEBUG. We can then look for empty names.

I don't know but isn't Daniel's observation about returning parent (dot?  dotdot?) still correct?

Matt

--- Additional comment from Frank Filz on 2018-07-27 11:29:42 EDT ---

(In reply to Matt Benjamin (redhat) from comment #13)
> I don't know but isn't Daniel's observation about returning parent (dot? 
> dotdot?) still correct?

Yes, the empty path resulting in returning the parent without AT_EMPTY_PATH passed as a flag is not good, without that flag, empty path should return an error.

But there's an issue where somehow Ganesha is getting a dirent with an empty name...

That COULD be because we got one from readdir from the filesystem. It could be because somehow we dropped the name.

--- Additional comment from Daniel Gryniewicz on 2018-07-27 12:04:37 EDT ---

We can't have dropped the name.  It's only freed and NULL'd.  I think it has to have come from either readdir() or rename().

--- Additional comment from Daniel Gryniewicz on 2018-07-30 09:57:13 EDT ---

AT_EMPTY_PATH is only valid for *at() calls (fstatat, fchonwat, etc), i.e. things that take a file descriptor and a name.  As far as I can tell, an empty name on a dirent is not valid.  Adding a check for creation, link, and rename is easy; dealing with readdir is much harder, as it may make an entire directory unreadable, and therefor un-removeable.  It would be better for this case if Gluster disallowed creation of such a dirent in the first place.

--- Additional comment from Daniel Gryniewicz on 2018-07-30 10:03:00 EDT ---

Actually, NFSv4 explicitly forbids zero-length dirents:

   If the oldname or newname is of zero length, NFS4ERR_INVAL will be
   returned.

NFSv3 does not include this requirement, but does allow NFS3ERR_INVAL for invalid names.  I'll add some checking for names from clients.  NFSv4 already has this, as part of it's standard UTF-8 handling, but I'll add some for other protocols.

GFAPI needs to be fixed in addition, so no other client can create such dirents.

--- Additional comment from Daniel Gryniewicz on 2018-07-30 10:19:54 EDT ---

 https://review.gerrithub.io/#/c/ffilz/nfs-ganesha/+/420770

Comment 1 Worker Ant 2018-07-31 09:27:59 UTC
REVIEW: https://review.gluster.org/20594 (gfapi : Handle the path == \"\" in pub_glfs_h_lookupat) posted (#1) for review on master by jiffin tony Thottan

Comment 2 Worker Ant 2018-08-13 11:45:26 UTC
COMMIT: https://review.gluster.org/20594 committed in master by "Kaleb KEITHLEY" <kkeithle> with a commit message- gfapi : Handle the path == "" glfs_resolve_at

Currently there is no check for path = "" in glfs_resolve_at.
So if application sends an empty path, then the function resolves
into the parent inode which is incorrect. Plus modified possible
of "path" with "origpath" in the same function.

Change-Id: Ie5ff9ce4b771607b7dbb3fe00704fe670421792a
fixes: bz#1610236
Signed-off-by: Jiffin Tony Thottan <jthottan>

Comment 3 Shyamsundar 2018-10-23 14:19:05 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.13, please open a new bug report.

glusterfs-3.12.13 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-August/000107.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 4 Shyamsundar 2018-10-23 15:15:49 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.