Bug 1551877 - [Ganesha] : NFS-Ganesha crashed during finds and ls in mdcache_new_entry.
Summary: [Ganesha] : NFS-Ganesha crashed during finds and ls in mdcache_new_entry.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nfs-ganesha
Version: rhgs-3.4
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.4.0
Assignee: Daniel Gryniewicz
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks: 1503137
TreeView+ depends on / blocked
 
Reported: 2018-03-06 04:04 UTC by Ambarish
Modified: 2018-09-24 11:57 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: In some error cases, sub-handle is already freed Consequence: Double-free, causing a crash Fix: Mark sub-handle NULL when already freed Result: No more double-free
Clone Of:
Environment:
Last Closed: 2018-09-04 06:54:24 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:2610 0 None None None 2018-09-04 06:55:33 UTC

Description Ambarish 2018-03-06 04:04:25 UTC
Description of problem:
------------------------

6 Node Ganesha cluster.


6 clients running finds and ls via v3/v4.


Ganesha crashed and dumped a core on 2 of my nodes.


This is the back trace :

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000559ac132b05b in mdcache_new_entry (
    export=export@entry=0x559ac295e680, 
    sub_handle=0x7f800414c8c0, 
    attrs_in=attrs_in@entry=0x7f827ef2bb50, 
    attrs_out=attrs_out@entry=0x0, 
    new_directory=new_directory@entry=false, 
    entry=entry@entry=0x7f827ef2bcb0, 
    state=state@entry=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:862
#2  0x0000559ac132d1ec in mdcache_locate_host (
    fh_desc=0x7f827ef2bd00, 
    export=export@entry=0x559ac295e680, 
    entry=entry@entry=0x7f827ef2bcb0, 
    attrs_out=attrs_out@entry=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1035
#3  0x0000559ac1326b9a in mdcache_create_handle (
    exp_hdl=0x559ac295e680, fh_desc=<optimized out>, 
    handle=0x7f827ef2bcf8, attrs_out=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1898
#4  0x0000559ac12ef720 in nfs3_FhandleToCache (
    fh3=fh3@entry=0x7f82280010f0, 
    status=status@entry=0x7f80040008c0, 
    rc=rc@entry=0x7f827ef2bd6c)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/support/nfs_filehandle_mgmt.c:98
#5  0x0000559ac12a4f9e in nfs3_getattr (
    arg=0x7f82280010f0, req=<optimized out>, 
---Type <return> to continue, or q <return> to quit---
    res=0x7f80040008c0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs3_getattr.c:83
#6  0x0000559ac12692eb in nfs_rpc_execute (
    reqdata=reqdata@entry=0x7f82280008c0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1290
#7  0x0000559ac126a94a in worker_run (
    ctx=0x559ac5a52030)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1562
#8  0x0000559ac12f9b59 in fridgethr_start_routine (
    arg=0x559ac5a52030)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:550
#9  0x00007f831db31dd5 in start_thread ()
   from /lib64/libpthread.so.0
#10 0x00007f831d1fdb3d in clone () from /lib64/libc.so.6
(gdb) 


Version-Release number of selected component (if applicable):
----------------------------------------------------------------

glusterfs-ganesha-3.12.2-4.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-2.el7rhgs.x86_64


How reproducible:
-----------------

Fairly , hit it on multiple nodes.


Steps to Reproduce:
-------------------

Run ls/finds ona  huge data set.

Actual results:
----------------

Ganesha service crashed.

Expected results:
-----------------

No crashes.

Additional info:
------------------

[root@gqas013 ~]# gluster v info
 
Volume Name: drogon
Type: Distributed-Replicate
Volume ID: bded407b-fbad-493d-b93e-6f0be7e49352
Status: Started
Snapshot Count: 0
Number of Bricks: 25 x 3 = 75
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick2: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick5: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick6: gqas007:/bricks1/A1
Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick8: gqas016.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick10: gqas008.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick11: gqas003.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick12: gqas007:/bricks2/A1
Brick13: gqas013.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick14: gqas016.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick15: gqas006.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick16: gqas008.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick17: gqas003.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick18: gqas007:/bricks3/A1
Brick19: gqas013.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick20: gqas016.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick21: gqas006.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick22: gqas008.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick23: gqas003.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick24: gqas007:/bricks4/A1
Brick25: gqas013.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick26: gqas016.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick27: gqas006.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick28: gqas008.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick29: gqas003.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick30: gqas007:/bricks5/A1
Brick31: gqas013.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick32: gqas016.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick33: gqas006.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick34: gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick35: gqas003.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick36: gqas007:/bricks6/A1
Brick37: gqas013.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick38: gqas016.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick39: gqas006.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick40: gqas008.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick41: gqas003.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick42: gqas007:/bricks7/A1
Brick43: gqas013.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick44: gqas016.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick45: gqas006.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick46: gqas008.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick47: gqas003.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick48: gqas007:/bricks8/A1
Brick49: gqas013.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick50: gqas016.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick51: gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick52: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick53: gqas003.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick54: gqas007:/bricks9/A1
Brick55: gqas013.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick56: gqas016.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick57: gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick58: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick59: gqas003.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick60: gqas007:/bricks10/A1
Brick61: gqas013.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick62: gqas016.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick63: gqas006.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick64: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick65: gqas003.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick66: gqas007:/bricks11/A1
Brick67: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick68: gqas016.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick69: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick70: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick71: gqas003.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick72: gqas007:/bricks12/A1
Brick73: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/A2
Brick74: gqas016.sbu.lab.eng.bos.redhat.com:/bricks12/A2
Brick75: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/A2
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
nfs-ganesha: enable
cluster.enable-shared-storage: enable

Comment 13 Manisha Saini 2018-07-30 19:54:17 UTC
Verified this with

# rpm -qa | grep ganesha
nfs-ganesha-2.5.5-8.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-14.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-8.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-8.el7rhgs.x86_64


Steps used for verification-
1.Created 6 node ganesha cluster
2.Created 2 volumes 6 x 3 Distributed-Replicate and  6 x (4 + 2) Distributed-Disperse volume.
3.Exported the volume via ganesha.
4.Mounted the volumes on 4 different clients using 4 different VIP's(2 clients mapping v3 and other 2 clients mapping via v4).
5.Create lots of files (untars,dd,touch) along with fs sanity(dbench,bonnie)
6.Triggered ls and find's on both the volumes from 4 different clients parallely for 6+ hours.


No crash was observed.But I could hit the issue of ls casing "Invalid argument" mentioned in BZ 1569657 on one of the client while verifying this BZ.

------
./dir1/linux-4.9.5/sound:
ls: reading directory ./dir1/linux-4.9.5/sound: Invalid argument
total 0

./dir1/linux-4.9.5/tools:
ls: reading directory ./dir1/linux-4.9.5/tools: Invalid argument
total 0

./dir1/linux-4.9.5/usr:
ls: reading directory ./dir1/linux-4.9.5/usr: Invalid argument
total 0

./dir1/linux-4.9.5/virt:
ls: reading direct
------

Since this is been tracked as a part of separate BZ,Moving this BZ to verified state.

Comment 15 errata-xmlrpc 2018-09-04 06:54:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2610


Note You need to log in before you can comment on or make changes to this bug.