Description of problem: ----------------------- 6 node cluster , 6 clients running an rm -rf on a huge data set (v3/v4). Ganesha crashed on all the nodes and dumped a core : (gdb) bt #0 0x000055a02cc29e4e in mdcache_clean_dirent_chunks (entry=0x7fc43407eab0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:512 #1 mdcache_dirent_invalidate_all (entry=entry@entry=0x7fc43407eab0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:537 #2 0x000055a02cc2a102 in mdc_clean_entry (entry=entry@entry=0x7fc43407eab0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:279 #3 0x000055a02cc19abf in mdcache_lru_clean (entry=0x7fc43407eab0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:592 #4 _mdcache_lru_unref (entry=entry@entry=0x7fc43407eab0, flags=flags@entry=0, func=func@entry=0x55a02cc71a73 <__func__.24175> "mdcache_put", line=line@entry=190) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1923 #5 0x000055a02cc2bb54 in mdcache_put (entry=<optimized out>) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.h:190 #6 mdcache_new_entry (export=export@entry=0x7fc6cc006290, sub_handle=0x7fc4700c3130, attrs_in=attrs_in@entry=0x7fc7157a7e00, attrs_out=attrs_out@entry=0x0, new_directory=new_directory@entry=false, entry=entry@entry=0x7fc7157a7d50, state=state@entry=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:802 #7 0x000055a02cc21e19 in mdcache_alloc_and_check_handle (export=export@entry=0x7fc6cc006290, sub_handle=<optimized out>, new_obj=new_obj@entry=0x7fc7157a7df8, new_directory=new_directory@entry=false, attrs_in=attrs_in@entry=0x7fc7157a7e00, attrs_out=attrs_out@entry=0x0, tag=tag@entry=0x55a02cc6fc01 "lookup ", parent=parent@entry=0x7fc484089e60, name=name@entry=0x7fc47007b7c0 "qla2xxx", invalidate=invalidate@entry=0x7fc7157a7def, state=state@entry=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:100 #8 0x000055a02cc2d791 in mdc_lookup_uncached (mdc_parent=mdc_parent@entry=0x7fc484089e60, name=name@entry=0x7fc47007b7c0 "qla2xxx", new_entry=new_entry@entry=0x7fc7157a7fd0, attrs_out=attrs_out@entry=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1400 #9 0x000055a02cc2dbf3 in mdc_lookup (mdc_parent=0x7fc484089e60, name=0x7fc47007b7c0 "qla2xxx", uncached=uncached@entry=true, new_entry=new_entry@entry=0x7fc7157a7fd0, attrs_out=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1333 #10 0x000055a02cc1ffcb in mdcache_lookup (parent=<optimized out>, name=<optimized out>, handle=0x7fc7157a8058, attrs_out=<optimized out>) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:177 #11 0x000055a02cb4f52f in fsal_lookup (parent=0x7fc484089e98, name=0x7fc47007b7c0 "qla2xxx", obj=obj@entry=0x7fc7157a8058, attrs_out=attrs_out@entry=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/fsal_helper.c:707 #12 0x000055a02cb853b6 in nfs4_op_lookup (op=<optimized out>, data=0x7fc7157a8150, resp=0x7fc4701abf40) at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_op_lookup.c:106 #13 0x000055a02cb7907f in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7fc47000dc00) at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs4_Compound.c:752 #14 0x000055a02cb692eb in nfs_rpc_execute (reqdata=reqdata@entry=0x7fc6580008c0) at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1290 #15 0x000055a02cb6a94a in worker_run (ctx=0x55a02d031c30) at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1562 #16 0x000055a02cbf9b59 in fridgethr_start_routine (arg=0x55a02d031c30) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:550 #17 0x00007fc7620bddd5 in start_thread (arg=0x7fc7157a9700) at pthread_create.c:308 #18 0x00007fc761789b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 (gdb) Version-Release number of selected component (if applicable): ------------------------------------------------------------- glusterfs-ganesha-3.12.2-4.el7rhgs.x86_64 nfs-ganesha-debuginfo-2.5.5-2.el7rhgs.x86_64 nfs-ganesha-2.5.5-2.el7rhgs.x86_64 How reproducible: ----------------- 100% Steps to Reproduce: ------------------- 1. Create a huge data set. 2. Run rm -rf from multiple v3/v4 clients. Additional info: ----------------- Volume Name: drogon Type: Distributed-Replicate Volume ID: bded407b-fbad-493d-b93e-6f0be7e49352 Status: Started Snapshot Count: 0 Number of Bricks: 25 x 3 = 75 Transport-type: tcp Bricks: Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick2: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick5: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick6: gqas007:/bricks1/A1 Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick8: gqas016.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick10: gqas008.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick11: gqas003.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick12: gqas007:/bricks2/A1 Brick13: gqas013.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick14: gqas016.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick15: gqas006.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick16: gqas008.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick17: gqas003.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick18: gqas007:/bricks3/A1 Brick19: gqas013.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick20: gqas016.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick21: gqas006.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick22: gqas008.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick23: gqas003.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick24: gqas007:/bricks4/A1 Brick25: gqas013.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick26: gqas016.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick27: gqas006.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick28: gqas008.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick29: gqas003.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick30: gqas007:/bricks5/A1 Brick31: gqas013.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick32: gqas016.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick33: gqas006.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick34: gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick35: gqas003.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick36: gqas007:/bricks6/A1 Brick37: gqas013.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick38: gqas016.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick39: gqas006.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick40: gqas008.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick41: gqas003.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick42: gqas007:/bricks7/A1 Brick43: gqas013.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick44: gqas016.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick45: gqas006.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick46: gqas008.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick47: gqas003.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick48: gqas007:/bricks8/A1 Brick49: gqas013.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick50: gqas016.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick51: gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick52: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick53: gqas003.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick54: gqas007:/bricks9/A1 Brick55: gqas013.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick56: gqas016.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick57: gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick58: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick59: gqas003.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick60: gqas007:/bricks10/A1 Brick61: gqas013.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick62: gqas016.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick63: gqas006.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick64: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick65: gqas003.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick66: gqas007:/bricks11/A1 Brick67: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick68: gqas016.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick69: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick70: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick71: gqas003.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick72: gqas007:/bricks12/A1 Brick73: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/A2 Brick74: gqas016.sbu.lab.eng.bos.redhat.com:/bricks12/A2 Brick75: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/A2 Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet features.cache-invalidation: on ganesha.enable: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.cache-invalidation: on performance.md-cache-timeout: 600 network.inode-lru-limit: 50000 cluster.enable-shared-storage: enable nfs-ganesha: enable
I think this should be solved by this upstream patch: https://review.gerrithub.io/402881 Can it be tested to see, since I can't reproduce locally?
This was added to the in-flight tracker but has still not rec'd the automatic pm_ack
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2610