Description of problem: ------------------------ 6 Node Ganesha cluster. 6 clients running finds and ls via v3/v4. Ganesha crashed and dumped a core on 2 of my nodes. This is the back trace : (gdb) bt #0 0x0000000000000000 in ?? () #1 0x0000559ac132b05b in mdcache_new_entry ( export=export@entry=0x559ac295e680, sub_handle=0x7f800414c8c0, attrs_in=attrs_in@entry=0x7f827ef2bb50, attrs_out=attrs_out@entry=0x0, new_directory=new_directory@entry=false, entry=entry@entry=0x7f827ef2bcb0, state=state@entry=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:862 #2 0x0000559ac132d1ec in mdcache_locate_host ( fh_desc=0x7f827ef2bd00, export=export@entry=0x559ac295e680, entry=entry@entry=0x7f827ef2bcb0, attrs_out=attrs_out@entry=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1035 #3 0x0000559ac1326b9a in mdcache_create_handle ( exp_hdl=0x559ac295e680, fh_desc=<optimized out>, handle=0x7f827ef2bcf8, attrs_out=0x0) at /usr/src/debug/nfs-ganesha-2.5.5/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1898 #4 0x0000559ac12ef720 in nfs3_FhandleToCache ( fh3=fh3@entry=0x7f82280010f0, status=status@entry=0x7f80040008c0, rc=rc@entry=0x7f827ef2bd6c) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/nfs_filehandle_mgmt.c:98 #5 0x0000559ac12a4f9e in nfs3_getattr ( arg=0x7f82280010f0, req=<optimized out>, ---Type <return> to continue, or q <return> to quit--- res=0x7f80040008c0) at /usr/src/debug/nfs-ganesha-2.5.5/src/Protocols/NFS/nfs3_getattr.c:83 #6 0x0000559ac12692eb in nfs_rpc_execute ( reqdata=reqdata@entry=0x7f82280008c0) at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1290 #7 0x0000559ac126a94a in worker_run ( ctx=0x559ac5a52030) at /usr/src/debug/nfs-ganesha-2.5.5/src/MainNFSD/nfs_worker_thread.c:1562 #8 0x0000559ac12f9b59 in fridgethr_start_routine ( arg=0x559ac5a52030) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:550 #9 0x00007f831db31dd5 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f831d1fdb3d in clone () from /lib64/libc.so.6 (gdb) Version-Release number of selected component (if applicable): ---------------------------------------------------------------- glusterfs-ganesha-3.12.2-4.el7rhgs.x86_64 nfs-ganesha-debuginfo-2.5.5-2.el7rhgs.x86_64 How reproducible: ----------------- Fairly , hit it on multiple nodes. Steps to Reproduce: ------------------- Run ls/finds ona huge data set. Actual results: ---------------- Ganesha service crashed. Expected results: ----------------- No crashes. Additional info: ------------------ [root@gqas013 ~]# gluster v info Volume Name: drogon Type: Distributed-Replicate Volume ID: bded407b-fbad-493d-b93e-6f0be7e49352 Status: Started Snapshot Count: 0 Number of Bricks: 25 x 3 = 75 Transport-type: tcp Bricks: Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick2: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick5: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick6: gqas007:/bricks1/A1 Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick8: gqas016.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick10: gqas008.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick11: gqas003.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick12: gqas007:/bricks2/A1 Brick13: gqas013.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick14: gqas016.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick15: gqas006.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick16: gqas008.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick17: gqas003.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick18: gqas007:/bricks3/A1 Brick19: gqas013.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick20: gqas016.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick21: gqas006.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick22: gqas008.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick23: gqas003.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick24: gqas007:/bricks4/A1 Brick25: gqas013.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick26: gqas016.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick27: gqas006.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick28: gqas008.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick29: gqas003.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick30: gqas007:/bricks5/A1 Brick31: gqas013.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick32: gqas016.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick33: gqas006.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick34: gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick35: gqas003.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick36: gqas007:/bricks6/A1 Brick37: gqas013.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick38: gqas016.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick39: gqas006.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick40: gqas008.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick41: gqas003.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick42: gqas007:/bricks7/A1 Brick43: gqas013.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick44: gqas016.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick45: gqas006.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick46: gqas008.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick47: gqas003.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick48: gqas007:/bricks8/A1 Brick49: gqas013.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick50: gqas016.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick51: gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick52: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick53: gqas003.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick54: gqas007:/bricks9/A1 Brick55: gqas013.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick56: gqas016.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick57: gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick58: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick59: gqas003.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick60: gqas007:/bricks10/A1 Brick61: gqas013.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick62: gqas016.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick63: gqas006.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick64: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick65: gqas003.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick66: gqas007:/bricks11/A1 Brick67: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick68: gqas016.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick69: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick70: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick71: gqas003.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick72: gqas007:/bricks12/A1 Brick73: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/A2 Brick74: gqas016.sbu.lab.eng.bos.redhat.com:/bricks12/A2 Brick75: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/A2 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.cache-invalidation: on performance.md-cache-timeout: 600 network.inode-lru-limit: 50000 nfs-ganesha: enable cluster.enable-shared-storage: enable
Verified this with # rpm -qa | grep ganesha nfs-ganesha-2.5.5-8.el7rhgs.x86_64 glusterfs-ganesha-3.12.2-14.el7rhgs.x86_64 nfs-ganesha-gluster-2.5.5-8.el7rhgs.x86_64 nfs-ganesha-debuginfo-2.5.5-8.el7rhgs.x86_64 Steps used for verification- 1.Created 6 node ganesha cluster 2.Created 2 volumes 6 x 3 Distributed-Replicate and 6 x (4 + 2) Distributed-Disperse volume. 3.Exported the volume via ganesha. 4.Mounted the volumes on 4 different clients using 4 different VIP's(2 clients mapping v3 and other 2 clients mapping via v4). 5.Create lots of files (untars,dd,touch) along with fs sanity(dbench,bonnie) 6.Triggered ls and find's on both the volumes from 4 different clients parallely for 6+ hours. No crash was observed.But I could hit the issue of ls casing "Invalid argument" mentioned in BZ 1569657 on one of the client while verifying this BZ. ------ ./dir1/linux-4.9.5/sound: ls: reading directory ./dir1/linux-4.9.5/sound: Invalid argument total 0 ./dir1/linux-4.9.5/tools: ls: reading directory ./dir1/linux-4.9.5/tools: Invalid argument total 0 ./dir1/linux-4.9.5/usr: ls: reading directory ./dir1/linux-4.9.5/usr: Invalid argument total 0 ./dir1/linux-4.9.5/virt: ls: reading direct ------ Since this is been tracked as a part of separate BZ,Moving this BZ to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2610