Bug 1551881 - [Ganesha] : Ganesha crashed during rm -rf from multiple clients.
Summary: [Ganesha] : Ganesha crashed during rm -rf from multiple clients.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nfs-ganesha
Version: rhgs-3.4
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.4.0
Assignee: Daniel Gryniewicz
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks: 1503137
TreeView+ depends on / blocked
 
Reported: 2018-03-06 04:35 UTC by Ambarish
Modified: 2018-09-24 12:15 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-04 06:54:24 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:2610 0 None None None 2018-09-04 06:55:33 UTC

Description Ambarish 2018-03-06 04:35:55 UTC
Description of problem:
-----------------------

6 node cluster , 6 clients running an rm -rf on a huge data set (v3/v4).

Ganesha crashed on all the nodes and dumped a core :


(gdb) bt
#0  0x000055a02cc29e4e in mdcache_clean_dirent_chunks (entry=0x7fc43407eab0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:512
#1  mdcache_dirent_invalidate_all (entry=entry@entry=0x7fc43407eab0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:537
#2  0x000055a02cc2a102 in mdc_clean_entry (entry=entry@entry=0x7fc43407eab0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:279
#3  0x000055a02cc19abf in mdcache_lru_clean (entry=0x7fc43407eab0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:592
#4  _mdcache_lru_unref (entry=entry@entry=0x7fc43407eab0, flags=flags@entry=0, func=func@entry=0x55a02cc71a73 <__func__.24175> "mdcache_put", line=line@entry=190)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1923
#5  0x000055a02cc2bb54 in mdcache_put (entry=<optimized out>) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.h:190
#6  mdcache_new_entry (export=export@entry=0x7fc6cc006290, sub_handle=0x7fc4700c3130, attrs_in=attrs_in@entry=0x7fc7157a7e00, attrs_out=attrs_out@entry=0x0, new_directory=new_directory@entry=false, entry=entry@entry=0x7fc7157a7d50, 
    state=state@entry=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:802
#7  0x000055a02cc21e19 in mdcache_alloc_and_check_handle (export=export@entry=0x7fc6cc006290, sub_handle=<optimized out>, new_obj=new_obj@entry=0x7fc7157a7df8, new_directory=new_directory@entry=false, 
    attrs_in=attrs_in@entry=0x7fc7157a7e00, attrs_out=attrs_out@entry=0x0, tag=tag@entry=0x55a02cc6fc01 "lookup ", parent=parent@entry=0x7fc484089e60, name=name@entry=0x7fc47007b7c0 "qla2xxx", invalidate=invalidate@entry=0x7fc7157a7def, 
    state=state@entry=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:100
#8  0x000055a02cc2d791 in mdc_lookup_uncached (mdc_parent=mdc_parent@entry=0x7fc484089e60, name=name@entry=0x7fc47007b7c0 "qla2xxx", new_entry=new_entry@entry=0x7fc7157a7fd0, attrs_out=attrs_out@entry=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1400
#9  0x000055a02cc2dbf3 in mdc_lookup (mdc_parent=0x7fc484089e60, name=0x7fc47007b7c0 "qla2xxx", uncached=uncached@entry=true, new_entry=new_entry@entry=0x7fc7157a7fd0, attrs_out=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1333
#10 0x000055a02cc1ffcb in mdcache_lookup (parent=<optimized out>, name=<optimized out>, handle=0x7fc7157a8058, attrs_out=<optimized out>) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:177
#11 0x000055a02cb4f52f in fsal_lookup (parent=0x7fc484089e98, name=0x7fc47007b7c0 "qla2xxx", obj=obj@entry=0x7fc7157a8058, attrs_out=attrs_out@entry=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/fsal_helper.c:707
#12 0x000055a02cb853b6 in nfs4_op_lookup (op=<optimized out>, data=0x7fc7157a8150, resp=0x7fc4701abf40) at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_op_lookup.c:106
#13 0x000055a02cb7907f in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7fc47000dc00) at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_Compound.c:752
#14 0x000055a02cb692eb in nfs_rpc_execute (reqdata=reqdata@entry=0x7fc6580008c0) at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1290
#15 0x000055a02cb6a94a in worker_run (ctx=0x55a02d031c30) at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1562
#16 0x000055a02cbf9b59 in fridgethr_start_routine (arg=0x55a02d031c30) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:550
#17 0x00007fc7620bddd5 in start_thread (arg=0x7fc7157a9700) at pthread_create.c:308
#18 0x00007fc761789b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) 




Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-ganesha-3.12.2-4.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-2.el7rhgs.x86_64
nfs-ganesha-2.5.5-2.el7rhgs.x86_64


How reproducible:
-----------------

100%

Steps to Reproduce:
-------------------

1. Create a huge data set.


2. Run rm -rf from multiple v3/v4 clients.


Additional info:
-----------------

Volume Name: drogon
Type: Distributed-Replicate
Volume ID: bded407b-fbad-493d-b93e-6f0be7e49352
Status: Started
Snapshot Count: 0
Number of Bricks: 25 x 3 = 75
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick2: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick5: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick6: gqas007:/bricks1/A1
Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick8: gqas016.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick10: gqas008.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick11: gqas003.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick12: gqas007:/bricks2/A1
Brick13: gqas013.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick14: gqas016.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick15: gqas006.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick16: gqas008.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick17: gqas003.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick18: gqas007:/bricks3/A1
Brick19: gqas013.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick20: gqas016.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick21: gqas006.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick22: gqas008.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick23: gqas003.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick24: gqas007:/bricks4/A1
Brick25: gqas013.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick26: gqas016.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick27: gqas006.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick28: gqas008.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick29: gqas003.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick30: gqas007:/bricks5/A1
Brick31: gqas013.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick32: gqas016.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick33: gqas006.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick34: gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick35: gqas003.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick36: gqas007:/bricks6/A1
Brick37: gqas013.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick38: gqas016.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick39: gqas006.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick40: gqas008.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick41: gqas003.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick42: gqas007:/bricks7/A1
Brick43: gqas013.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick44: gqas016.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick45: gqas006.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick46: gqas008.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick47: gqas003.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick48: gqas007:/bricks8/A1
Brick49: gqas013.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick50: gqas016.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick51: gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick52: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick53: gqas003.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick54: gqas007:/bricks9/A1
Brick55: gqas013.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick56: gqas016.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick57: gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick58: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick59: gqas003.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick60: gqas007:/bricks10/A1
Brick61: gqas013.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick62: gqas016.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick63: gqas006.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick64: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick65: gqas003.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick66: gqas007:/bricks11/A1
Brick67: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick68: gqas016.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick69: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick70: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick71: gqas003.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick72: gqas007:/bricks12/A1
Brick73: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/A2
Brick74: gqas016.sbu.lab.eng.bos.redhat.com:/bricks12/A2
Brick75: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/A2
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
features.cache-invalidation: on
ganesha.enable: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
cluster.enable-shared-storage: enable
nfs-ganesha: enable

Comment 4 Daniel Gryniewicz 2018-03-06 14:55:23 UTC
I think this should be solved by this upstream patch:

https://review.gerrithub.io/402881

Can it be tested to see, since I can't reproduce locally?

Comment 6 Kaleb KEITHLEY 2018-03-13 16:21:48 UTC
This was added to the in-flight tracker but has still not rec'd the automatic pm_ack

Comment 12 errata-xmlrpc 2018-09-04 06:54:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2610


Note You need to log in before you can comment on or make changes to this bug.