Description of problem: ------------------------ 4 Node cluster containing 3 volumes - testvol{1,2,3}. 4 clients mount these volumes (NOT in a 1:1 way) : Client 1 : testvol1 via v3 and v4,testvol3(v3) Client 2 : testvol1(v3) and testvol2(v3) Client 3 : testvol2(v3) and testvol3 via v3 and v4 Client 4 : testvol1(v3) ,testvol3(v3) ,testvol3(v4) Almost 2.5 hours into my workload,Ganesha crashed on 3/4 nodes and dumped core. Pacemaker quorumwas lost,so all IOs were stoppped. *********** On gqas009 *********** (gdb) bt #0 0x00007fc0abd4446f in __inode_ctx_free (inode=inode@entry=0x7fc08d8d01d4) at inode.c:332 #1 0x00007fc0abd45652 in __inode_destroy (inode=0x7fc08d8d01d4) at inode.c:353 #2 inode_table_prune (table=table@entry=0x7fc090074500) at inode.c:1543 #3 0x00007fc0abd45934 in inode_unref (inode=0x7fc08d8d01d4) at inode.c:524 #4 0x00007fc0b023d3b6 in pub_glfs_h_close (object=0x7fbf8c08f470) at glfs-handleops.c:1365 #5 0x00007fc0b0656a59 in handle_release (obj_hdl=0x7fbf8c023658) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:71 #6 0x00007fc13d6784c3 in mdcache_lru_clean (entry=0x7fbf8c0250c0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:421 #7 mdcache_lru_unref (entry=0x7fbf8c0250c0, flags=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1464 #8 0x00007fc13d5b6a0b in fsal_remove (parent=parent@entry=0x7fbe4415e2d8, name=0x7fbfd4119f50 "jz4780-nemc.c") at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:1599 #9 0x00007fc13d5f22ac in nfs4_op_remove (op=<optimized out>, data=<optimized out>, resp=0x7fbfd400b130) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_remove.c:104 #10 0x00007fc13d5ddf7d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7fbfd400abf0) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734 #11 0x00007fc13d5cf12c in nfs_rpc_execute (reqdata=reqdata@entry=0x7fc0301824a0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281 #12 0x00007fc13d5d078a in worker_run (ctx=0x7fc13e1514a0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548 #13 0x00007fc13d65a189 in fridgethr_start_routine (arg=0x7fc13e1514a0) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550 #14 0x00007fc13bb3adc5 in start_thread () from /lib64/libpthread.so.0 #15 0x00007fc13b20973d in clone () from /lib64/libc.so.6 (gdb) *********** On gqas014 *********** (gdb) bt #0 0x00007fa29957546f in __inode_ctx_free (inode=inode@entry=0x7fa252929204) at inode.c:332 #1 0x00007fa299576652 in __inode_destroy (inode=0x7fa252929204) at inode.c:353 #2 inode_table_prune (table=table@entry=0x7fa25c06a620) at inode.c:1543 #3 0x00007fa299576934 in inode_unref (inode=0x7fa252929204) at inode.c:524 #4 0x00007fa29984e3b6 in pub_glfs_h_close (object=0x7fa05c067210) at glfs-handleops.c:1365 #5 0x00007fa299c67a59 in handle_release (obj_hdl=0x7fa05c0e94c8) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/handle.c:71 #6 0x00007fa29e708812 in mdcache_lru_clean (entry=0x7fa05c0f6470) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:421 #7 mdcache_lru_get (entry=entry@entry=0x7fa1d4d4aa28) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1201 #8 0x00007fa29e712c7e in mdcache_alloc_handle (fs=0x0, sub_handle=0x7f9f240de8b8, export=0x7fa29eb2f700) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:117 #9 mdcache_new_entry (export=export@entry=0x7fa29eb2f700, sub_handle=0x7f9f240de8b8, attrs_in=attrs_in@entry=0x7fa1d4d4ab80, attrs_out=attrs_out@entry=0x7fa1d4d4ad90, new_directory=new_directory@entry=false, entry=entry@entry=0x7fa1d4d4aae0, state=state@entry=0x0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:411 #10 0x00007fa29e70c6b4 in mdcache_alloc_and_check_handle (export=export@entry=0x7fa29eb2f700, sub_handle=<optimized out>, new_obj=new_obj@entry=0x7fa1d4d4ab78, new_directory=new_directory@entry=false, attrs_in=attrs_in@entry=0x7fa1d4d4ab80, attrs_out=attrs_out@entry=0x7fa1d4d4ad90, tag=tag@entry=0x7fa29e741b84 "lookup ", parent=parent@entry=0x7f9f34094010, name=name@entry=0x7fa1940010c0 "zfcp_ccw.c", invalidate=invalidate@entry=true, state=state@entry=0x0) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:93 #11 0x00007fa29e713efa in mdc_lookup_uncached (mdc_parent=mdc_parent@entry=0x7f9f34094010, name=name@entry=0x7fa1940010c0 "zfcp_ccw.c", new_entry=new_entry@entry=0x7fa1d4d4ad20, attrs_out=attrs_out@entry=0x7fa1d4d4ad90) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1041 #12 0x00007fa29e7142cd in mdc_lookup (mdc_parent=0x7f9f34094010, name=0x7fa1940010c0 "zfcp_ccw.c", uncached=uncached@entry=true, new_entry=new_entry@entry=0x7fa1d4d4ad20, attrs_out=0x7fa1d4d4ad90) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:985 #13 0x00007fa29e70b9eb in mdcache_lookup (parent=<optimized out>, name=<optimized out>, handle=0x7fa1d4d4ad88, ---Type <return> to continue, or q <return> to quit--- attrs_out=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:166 #14 0x00007fa29e643c97 in fsal_lookup (parent=parent@entry=0x7f9f34094048, name=0x7fa1940010c0 "zfcp_ccw.c", obj=obj@entry=0x7fa1d4d4ad88, attrs_out=attrs_out@entry=0x7fa1d4d4ad90) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:712 #15 0x00007fa29e696b81 in nfs3_lookup (arg=0x7fa194000aa8, req=<optimized out>, res=0x7f9f2410f980) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs3_lookup.c:102 #16 0x00007fa29e65d12c in nfs_rpc_execute (reqdata=reqdata@entry=0x7fa1940008c0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281 #17 0x00007fa29e65e78a in worker_run (ctx=0x7fa29eb916e0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548 #18 0x00007fa29e6e8189 in fridgethr_start_routine (arg=0x7fa29eb916e0) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550 #19 0x00007fa29cbc8dc5 in start_thread () from /lib64/libpthread.so.0 #20 0x00007fa29c29773d in clone () from /lib64/libc.so.6 (gdb) *********** On gqas015 *********** (gdb) #0 0x00007f494efa01d7 in raise () from /lib64/libc.so.6 #1 0x00007f494efa18c8 in abort () from /lib64/libc.so.6 #2 0x00007f494efdff07 in __libc_message () from /lib64/libc.so.6 #3 0x00007f494efe6c02 in malloc_consolidate () from /lib64/libc.so.6 #4 0x00007f494efe8385 in _int_malloc () from /lib64/libc.so.6 #5 0x00007f494efeafbc in malloc () from /lib64/libc.so.6 #6 0x00007f4951462a5e in gsh_malloc__ ( file=0x7f49514fb808 "/builddir/build/BUILD/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs3_read.c", line=207, function=<synthetic pointer>, n=4096) at /usr/src/debug/nfs-ganesha-2.4.1/src/include/abstract_mem.h:78 #7 nfs3_read (arg=0x7f476c234d88, req=<optimized out>, res=0x7f4700035840) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs3_read.c:207 #8 0x00007f495142812c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f476c234ba0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281 #9 0x00007f495142978a in worker_run (ctx=0x7f4953187400) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548 #10 0x00007f49514b3189 in fridgethr_start_routine (arg=0x7f4953187400) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550 #11 0x00007f494f993dc5 in start_thread () from /lib64/libpthread.so.0 #12 0x00007f494f06273d in clone () from /lib64/libc.so.6 (gdb) Version-Release number of selected component (if applicable): -------------------------------------------------------------- glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64 nfs-ganesha-2.4.1-1.el7rhgs.x86_64 How reproducible: ---------------- 1/1 Steps to Reproduce: ------------------ 1. Create a cluster with more than 1 volume. 2. Mount these volumes(more than 1 mount per client) via v3 and v4. 3. Pump IO. Actual results: --------------- Ganesha crashes on 3 nodes and logs f;looded with error messages(that's a separate BZ) Expected results: ----------------- No crashes/errors in logs. Additional info: ---------------- OS : RHEL 7.3 *Vol Config* : Volume Name: testvol1 Type: Distribute Volume ID: 7a2dae27-0646-4284-9a34-e7b8455d439f Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: gqas014.sbu.lab.eng.bos.redhat.com:/bricks/testvol1_brick0 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable Volume Name: testvol2 Type: Distribute Volume ID: 5a61a980-c8e6-41d7-bd00-9ac7f51cbf5e Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: gqas009.sbu.lab.eng.bos.redhat.com:/bricks/testvol2_brick1 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable Volume Name: testvol3 Type: Replicate Volume ID: 298bfa41-7469-4ff2-b9d4-aafb67c5cb9b Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gqas010.sbu.lab.eng.bos.redhat.com:/bricks/testvol3_brick2 Brick2: gqas015.sbu.lab.eng.bos.redhat.com:/bricks/testvol3_brick3 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable [root@gqas009 tmp]#
Log flooding under the same use case is tracked via https://bugzilla.redhat.com/show_bug.cgi?id=1401162
The first two crashes reported in inode_ctx_free are being tracked as part of bug1403714. Nug1403727 has been filed for the 3rd crash (on nodegqas015) - memory corruption
I tried this use case with Dan's fix for the crashes. Ganesha crashed on 3/4 nodes after ~9 hours of pumping IO (single threaded) from 6 clients.Since pacemaker quorum wasn't met,IO came to a halt on all clients. It didn't print anything from code this time,not sure how helpful this is : *********** On gqas013 *********** kroot@gqas013:~\[root@gqas013 ~]# /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -F ================================================================= ==16012== ERROR: AddressSanitizer: heap-buffer-overflow on address 0x604a00856c20 at pc 0x5a1aa9 bp 0x7f99c6ef4730 sp 0x7f99c6ef4720 WRITE of size 8 at 0x604a00856c20 thread T270 ==16012== WARNING: Trying to symbolize code, but external symbolizer is not initialized! #0 0x5a1aa8 (/usr/bin/ganesha.nfsd+0x5a1aa8) #1 0x5a563a (/usr/bin/ganesha.nfsd+0x5a563a) #2 0x4a6542 (/usr/bin/ganesha.nfsd+0x4a6542) #3 0x4a215b (/usr/bin/ganesha.nfsd+0x4a215b) #4 0x47a269 (/usr/bin/ganesha.nfsd+0x47a269) #5 0x47b9e7 (/usr/bin/ganesha.nfsd+0x47b9e7) #6 0x6257cf (/usr/bin/ganesha.nfsd+0x6257cf) #7 0x7f9a82836a97 (/lib64/libasan.so.0+0x19a97) #8 0x7f9a82608dc4 (/lib64/libpthread.so.0+0x7dc4) #9 0x7f9a8043173c (/lib64/libc.so.6+0xf773c) 0x604a00856c20 is located 1184 bytes inside of 1480-byte region [0x604a00856780,0x604a00856d48) freed by thread T259 here: #0 0x7f9a82833009 (/lib64/libasan.so.0+0x16009) #1 0x66b2ad (/usr/bin/ganesha.nfsd+0x66b2ad) #2 0x66b32c (/usr/bin/ganesha.nfsd+0x66b32c) #3 0x675139 (/usr/bin/ganesha.nfsd+0x675139) #4 0x67a988 (/usr/bin/ganesha.nfsd+0x67a988) #5 0x68543f (/usr/bin/ganesha.nfsd+0x68543f) #6 0x44ddb5 (/usr/bin/ganesha.nfsd+0x44ddb5) #7 0x4eb2e4 (/usr/bin/ganesha.nfsd+0x4eb2e4) #8 0x4a215b (/usr/bin/ganesha.nfsd+0x4a215b) #9 0x47a269 (/usr/bin/ganesha.nfsd+0x47a269) #10 0x47b9e7 (/usr/bin/ganesha.nfsd+0x47b9e7) #11 0x6257cf (/usr/bin/ganesha.nfsd+0x6257cf) #12 0x7f9a82836a97 (/lib64/libasan.so.0+0x19a97) previously allocated by thread T255 here: #0 0x7f9a82833225 (/lib64/libasan.so.0+0x16225) #1 0x66b262 (/usr/bin/ganesha.nfsd+0x66b262) #2 0x66b30e (/usr/bin/ganesha.nfsd+0x66b30e) #3 0x672c58 (/usr/bin/ganesha.nfsd+0x672c58) #4 0x672f06 (/usr/bin/ganesha.nfsd+0x672f06) #5 0x68fd52 (/usr/bin/ganesha.nfsd+0x68fd52) #6 0x6922cb (/usr/bin/ganesha.nfsd+0x6922cb) #7 0x67ad95 (/usr/bin/ganesha.nfsd+0x67ad95) #8 0x68a028 (/usr/bin/ganesha.nfsd+0x68a028) #9 0x446cd9 (/usr/bin/ganesha.nfsd+0x446cd9) #10 0x44f4b3 (/usr/bin/ganesha.nfsd+0x44f4b3) #11 0x4d3280 (/usr/bin/ganesha.nfsd+0x4d3280) #12 0x4d680f (/usr/bin/ganesha.nfsd+0x4d680f) #13 0x4a215b (/usr/bin/ganesha.nfsd+0x4a215b) #14 0x47a269 (/usr/bin/ganesha.nfsd+0x47a269) #15 0x47b9e7 (/usr/bin/ganesha.nfsd+0x47b9e7) #16 0x6257cf (/usr/bin/ganesha.nfsd+0x6257cf) #17 0x7f9a82836a97 (/lib64/libasan.so.0+0x19a97) Thread T270 created by T0 here: #0 0x7f9a82827c3a (/lib64/libasan.so.0+0xac3a) #1 0x62d8d6 (/usr/bin/ganesha.nfsd+0x62d8d6) #2 0x47bf5a (/usr/bin/ganesha.nfsd+0x47bf5a) #3 0x48d962 (/usr/bin/ganesha.nfsd+0x48d962) #4 0x48fbd8 (/usr/bin/ganesha.nfsd+0x48fbd8) #5 0x41d82d (/usr/bin/ganesha.nfsd+0x41d82d) #6 0x7f9a8035bb34 (/lib64/libc.so.6+0x21b34) Thread T259 created by T0 here: #0 0x7f9a82827c3a (/lib64/libasan.so.0+0xac3a) #1 0x62d8d6 (/usr/bin/ganesha.nfsd+0x62d8d6) #2 0x47bf5a (/usr/bin/ganesha.nfsd+0x47bf5a) #3 0x48d962 (/usr/bin/ganesha.nfsd+0x48d962) #4 0x48fbd8 (/usr/bin/ganesha.nfsd+0x48fbd8) #5 0x41d82d (/usr/bin/ganesha.nfsd+0x41d82d) #6 0x7f9a8035bb34 (/lib64/libc.so.6+0x21b34) Thread T255 created by T0 here: #0 0x7f9a82827c3a (/lib64/libasan.so.0+0xac3a) #1 0x62d8d6 (/usr/bin/ganesha.nfsd+0x62d8d6) #2 0x47bf5a (/usr/bin/ganesha.nfsd+0x47bf5a) #3 0x48d962 (/usr/bin/ganesha.nfsd+0x48d962) #4 0x48fbd8 (/usr/bin/ganesha.nfsd+0x48fbd8) #5 0x41d82d (/usr/bin/ganesha.nfsd+0x41d82d) #6 0x7f9a8035bb34 (/lib64/libc.so.6+0x21b34) Shadow bytes around the buggy address: 0x0c09c0102d30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0102d40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0102d50: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0102d60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0102d70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa =>0x0c09c0102d80: fa fa fa fa[fa]fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0102d90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0102da0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0102db0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0102dc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c09c0102dd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap righ redzone: fb Freed Heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe ==16012== ABORTING *********** On gqas011 *********** kroot@gqas011:~\[root@gqas011 ~]# /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -F ================================================================= ==6683== ERROR: AddressSanitizer: heap-use-after-free on address 0x604a038fc5a0 at pc 0x5a1aa9 bp 0x7f42c16ed730 sp 0x7f42c16ed720 WRITE of size 8 at 0x604a038fc5a0 thread T296 ==6683== WARNING: Trying to symbolize code, but external symbolizer is not initialized! #0 0x5a1aa8 (/usr/bin/ganesha.nfsd+0x5a1aa8) #1 0x5a563a (/usr/bin/ganesha.nfsd+0x5a563a) #2 0x4a6542 (/usr/bin/ganesha.nfsd+0x4a6542) #3 0x4a215b (/usr/bin/ganesha.nfsd+0x4a215b) #4 0x47a269 (/usr/bin/ganesha.nfsd+0x47a269) #5 0x47b9e7 (/usr/bin/ganesha.nfsd+0x47b9e7) #6 0x6257cf (/usr/bin/ganesha.nfsd+0x6257cf) #7 0x7f438eeffa97 (/lib64/libasan.so.0+0x19a97) #8 0x7f438ecd1dc4 (/lib64/libpthread.so.0+0x7dc4) #9 0x7f438cafa73c (/lib64/libc.so.6+0xf773c) 0x604a038fc5a0 is located 1184 bytes inside of 1480-byte region [0x604a038fc100,0x604a038fc6c8) freed by thread T92 here: #0 0x7f438eefc009 (/lib64/libasan.so.0+0x16009) #1 0x66b2ad (/usr/bin/ganesha.nfsd+0x66b2ad) #2 0x66b32c (/usr/bin/ganesha.nfsd+0x66b32c) #3 0x675139 (/usr/bin/ganesha.nfsd+0x675139) #4 0x67a988 (/usr/bin/ganesha.nfsd+0x67a988) #5 0x68543f (/usr/bin/ganesha.nfsd+0x68543f) #6 0x44ddb5 (/usr/bin/ganesha.nfsd+0x44ddb5) #7 0x4eb2e4 (/usr/bin/ganesha.nfsd+0x4eb2e4) #8 0x4a215b (/usr/bin/ganesha.nfsd+0x4a215b) #9 0x47a269 (/usr/bin/ganesha.nfsd+0x47a269) #10 0x47b9e7 (/usr/bin/ganesha.nfsd+0x47b9e7) #11 0x6257cf (/usr/bin/ganesha.nfsd+0x6257cf) #12 0x7f438eeffa97 (/lib64/libasan.so.0+0x19a97) previously allocated by thread T151 here: #0 0x7f438eefc225 (/lib64/libasan.so.0+0x16225) #1 0x66b262 (/usr/bin/ganesha.nfsd+0x66b262) #2 0x66b30e (/usr/bin/ganesha.nfsd+0x66b30e) #3 0x672c58 (/usr/bin/ganesha.nfsd+0x672c58) #4 0x672f06 (/usr/bin/ganesha.nfsd+0x672f06) #5 0x68fd52 (/usr/bin/ganesha.nfsd+0x68fd52) #6 0x6922cb (/usr/bin/ganesha.nfsd+0x6922cb) #7 0x67ad95 (/usr/bin/ganesha.nfsd+0x67ad95) #8 0x67d3a9 (/usr/bin/ganesha.nfsd+0x67d3a9) #9 0x449215 (/usr/bin/ganesha.nfsd+0x449215) #10 0x4a8c80 (/usr/bin/ganesha.nfsd+0x4a8c80) #11 0x4a215b (/usr/bin/ganesha.nfsd+0x4a215b) #12 0x47a269 (/usr/bin/ganesha.nfsd+0x47a269) #13 0x47b9e7 (/usr/bin/ganesha.nfsd+0x47b9e7) #14 0x6257cf (/usr/bin/ganesha.nfsd+0x6257cf) #15 0x7f438eeffa97 (/lib64/libasan.so.0+0x19a97) Thread T296 created by T0 here: #0 0x7f438eef0c3a (/lib64/libasan.so.0+0xac3a) #1 0x62d8d6 (/usr/bin/ganesha.nfsd+0x62d8d6) #2 0x47bf5a (/usr/bin/ganesha.nfsd+0x47bf5a) #3 0x48d962 (/usr/bin/ganesha.nfsd+0x48d962) #4 0x48fbd8 (/usr/bin/ganesha.nfsd+0x48fbd8) #5 0x41d82d (/usr/bin/ganesha.nfsd+0x41d82d) #6 0x7f438ca24b34 (/lib64/libc.so.6+0x21b34) Thread T92 created by T0 here: #0 0x7f438eef0c3a (/lib64/libasan.so.0+0xac3a) #1 0x62d8d6 (/usr/bin/ganesha.nfsd+0x62d8d6) #2 0x47bf5a (/usr/bin/ganesha.nfsd+0x47bf5a) #3 0x48d962 (/usr/bin/ganesha.nfsd+0x48d962) #4 0x48fbd8 (/usr/bin/ganesha.nfsd+0x48fbd8) #5 0x41d82d (/usr/bin/ganesha.nfsd+0x41d82d) #6 0x7f438ca24b34 (/lib64/libc.so.6+0x21b34) Thread T151 created by T0 here: #0 0x7f438eef0c3a (/lib64/libasan.so.0+0xac3a) #1 0x62d8d6 (/usr/bin/ganesha.nfsd+0x62d8d6) #2 0x47bf5a (/usr/bin/ganesha.nfsd+0x47bf5a) #3 0x48d962 (/usr/bin/ganesha.nfsd+0x48d962) #4 0x48fbd8 (/usr/bin/ganesha.nfsd+0x48fbd8) #5 0x41d82d (/usr/bin/ganesha.nfsd+0x41d82d) #6 0x7f438ca24b34 (/lib64/libc.so.6+0x21b34) Shadow bytes around the buggy address: 0x0c09c0717860: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c09c0717870: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c09c0717880: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c09c0717890: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c09c07178a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x0c09c07178b0: fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd 0x0c09c07178c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c09c07178d0: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa 0x0c09c07178e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c07178f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0717900: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap righ redzone: fb Freed Heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe ==6683== ABORTING *********** On gqas006 *********** kroot@gqas006:~\[root@gqas006 ~]# /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -F ================================================================= ==4450== ERROR: AddressSanitizer: heap-buffer-overflow on address 0x604a01a04ca0 at pc 0x5a1aa9 bp 0x7f391e825730 sp 0x7f391e825720 WRITE of size 8 at 0x604a01a04ca0 thread T107 ==4450== WARNING: Trying to symbolize code, but external symbolizer is not initialized! #0 0x5a1aa8 (/usr/bin/ganesha.nfsd+0x5a1aa8) #1 0x5a563a (/usr/bin/ganesha.nfsd+0x5a563a) #2 0x4a6542 (/usr/bin/ganesha.nfsd+0x4a6542) #3 0x4a215b (/usr/bin/ganesha.nfsd+0x4a215b) #4 0x47a269 (/usr/bin/ganesha.nfsd+0x47a269) #5 0x47b9e7 (/usr/bin/ganesha.nfsd+0x47b9e7) #6 0x6257cf (/usr/bin/ganesha.nfsd+0x6257cf) #7 0x7f3969b4fa97 (/lib64/libasan.so.0+0x19a97) #8 0x7f3969921dc4 (/lib64/libpthread.so.0+0x7dc4) #9 0x7f396774a73c (/lib64/libc.so.6+0xf773c) 0x604a01a04ca0 is located 1184 bytes inside of 1480-byte region [0x604a01a04800,0x604a01a04dc8) freed by thread T296 here: #0 0x7f3969b4c009 (/lib64/libasan.so.0+0x16009) #1 0x66b2ad (/usr/bin/ganesha.nfsd+0x66b2ad) #2 0x66b32c (/usr/bin/ganesha.nfsd+0x66b32c) #3 0x675139 (/usr/bin/ganesha.nfsd+0x675139) #4 0x67a988 (/usr/bin/ganesha.nfsd+0x67a988) #5 0x68543f (/usr/bin/ganesha.nfsd+0x68543f) #6 0x44ddb5 (/usr/bin/ganesha.nfsd+0x44ddb5) #7 0x4eb2e4 (/usr/bin/ganesha.nfsd+0x4eb2e4) #8 0x4a215b (/usr/bin/ganesha.nfsd+0x4a215b) #9 0x47a269 (/usr/bin/ganesha.nfsd+0x47a269) #10 0x47b9e7 (/usr/bin/ganesha.nfsd+0x47b9e7) #11 0x6257cf (/usr/bin/ganesha.nfsd+0x6257cf) #12 0x7f3969b4fa97 (/lib64/libasan.so.0+0x19a97) previously allocated by thread T107 here: #0 0x7f3969b4c225 (/lib64/libasan.so.0+0x16225) #1 0x66b262 (/usr/bin/ganesha.nfsd+0x66b262) #2 0x66b30e (/usr/bin/ganesha.nfsd+0x66b30e) #3 0x672c58 (/usr/bin/ganesha.nfsd+0x672c58) #4 0x672f06 (/usr/bin/ganesha.nfsd+0x672f06) #5 0x68fd52 (/usr/bin/ganesha.nfsd+0x68fd52) #6 0x6922cb (/usr/bin/ganesha.nfsd+0x6922cb) #7 0x67ad95 (/usr/bin/ganesha.nfsd+0x67ad95) #8 0x68a028 (/usr/bin/ganesha.nfsd+0x68a028) #9 0x446cd9 (/usr/bin/ganesha.nfsd+0x446cd9) #10 0x44f4b3 (/usr/bin/ganesha.nfsd+0x44f4b3) #11 0x4d3280 (/usr/bin/ganesha.nfsd+0x4d3280) #12 0x4d680f (/usr/bin/ganesha.nfsd+0x4d680f) #13 0x4a215b (/usr/bin/ganesha.nfsd+0x4a215b) #14 0x47a269 (/usr/bin/ganesha.nfsd+0x47a269) #15 0x47b9e7 (/usr/bin/ganesha.nfsd+0x47b9e7) #16 0x6257cf (/usr/bin/ganesha.nfsd+0x6257cf) #17 0x7f3969b4fa97 (/lib64/libasan.so.0+0x19a97) Thread T107 created by T0 here: #0 0x7f3969b40c3a (/lib64/libasan.so.0+0xac3a) #1 0x62d8d6 (/usr/bin/ganesha.nfsd+0x62d8d6) #2 0x47bf5a (/usr/bin/ganesha.nfsd+0x47bf5a) #3 0x48d962 (/usr/bin/ganesha.nfsd+0x48d962) #4 0x48fbd8 (/usr/bin/ganesha.nfsd+0x48fbd8) #5 0x41d82d (/usr/bin/ganesha.nfsd+0x41d82d) #6 0x7f3967674b34 (/lib64/libc.so.6+0x21b34) Thread T296 created by T0 here: #0 0x7f3969b40c3a (/lib64/libasan.so.0+0xac3a) #1 0x62d8d6 (/usr/bin/ganesha.nfsd+0x62d8d6) #2 0x47bf5a (/usr/bin/ganesha.nfsd+0x47bf5a) #3 0x48d962 (/usr/bin/ganesha.nfsd+0x48d962) #4 0x48fbd8 (/usr/bin/ganesha.nfsd+0x48fbd8) #5 0x41d82d (/usr/bin/ganesha.nfsd+0x41d82d) #6 0x7f3967674b34 (/lib64/libc.so.6+0x21b34) Shadow bytes around the buggy address: 0x0c09c0338940: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0338950: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0338960: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0338970: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c0338980: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa =>0x0c09c0338990: fa fa fa fa[fa]fa fa fa fa fa fa fa fa fa fa fa 0x0c09c03389a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c03389b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c03389c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c09c03389d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c09c03389e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap righ redzone: fb Freed Heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe ==4450== ABORTING
(In reply to Ambarish from comment #8) > I tried this use case with Dan's fix for the crashes. > > Ganesha crashed on 3/4 nodes after ~9 hours of pumping IO (single threaded) > from 6 clients.Since pacemaker quorum wasn't met,IO came to a halt on all > clients. > > It didn't print anything from code this time,not sure how helpful this is : > > *********** > On gqas013 > *********** > > kroot@gqas013:~\[root@gqas013 ~]# /usr/bin/ganesha.nfsd -L > /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -F > ================================================================= > ==16012== ERROR: AddressSanitizer: heap-buffer-overflow on address > 0x604a00856c20 at pc 0x5a1aa9 bp 0x7f99c6ef4730 sp 0x7f99c6ef4720 > WRITE of size 8 at 0x604a00856c20 thread T270 > ==16012== WARNING: Trying to symbolize code, but external symbolizer is not > initialized! > #0 0x5a1aa8 (/usr/bin/ganesha.nfsd+0x5a1aa8) > #1 0x5a563a (/usr/bin/ganesha.nfsd+0x5a563a) > #2 0x4a6542 (/usr/bin/ganesha.nfsd+0x4a6542) > #3 0x4a215b (/usr/bin/ganesha.nfsd+0x4a215b) > #4 0x47a269 (/usr/bin/ganesha.nfsd+0x47a269) > #5 0x47b9e7 (/usr/bin/ganesha.nfsd+0x47b9e7) > #6 0x6257cf (/usr/bin/ganesha.nfsd+0x6257cf) > #7 0x7f9a82836a97 (/lib64/libasan.so.0+0x19a97) > #8 0x7f9a82608dc4 (/lib64/libpthread.so.0+0x7dc4) > #9 0x7f9a8043173c (/lib64/libc.so.6+0xf773c) (gdb) l *0x5a1aa8 0x5a1aa8 is in glist_del (/root/ravi/nfs-ganesha/src/include/gsh_list.h:101). 96 { 97 struct glist_head *left = node->prev; 98 struct glist_head *right = node->next; 99 100 if (left != NULL) 101 left->next = right; 102 if (right != NULL) 103 right->prev = left; 104 node->next = NULL; 105 node->prev = NULL; (gdb) list *0x5a563a 0x5a563a is in state_del_locked (/root/ravi/nfs-ganesha/src/SAL/nfs4_state.c:373). 368 */ 369 obj->state_hdl->no_cleanup = true; 370 371 /* Remove from the list of states for a particular file */ 372 PTHREAD_MUTEX_lock(&state->state_mutex); 373 glist_del(&state->state_list); 374 memset(&state->state_obj, 0, sizeof(state->state_obj)); 375 PTHREAD_MUTEX_unlock(&state->state_mutex); 376 377 if (obj->fsal->m_ops.support_ex(obj)) { (gdb) l *0x4a6542 0x4a6542 is in nfs4_op_close (/root/ravi/nfs-ganesha/src/Protocols/NFS/nfs4_op_close.c:310). 305 306 /* File is closed, release the corresponding state. If the FSAL 307 * supports extended ops, this will result in closing any open files 308 * the FSAL has for this state. 309 */ 310 state_del_locked(state_found); 311 312 /* Poison the current stateid */ 313 data->current_stateid_valid = false; 314 (gdb) l *0x4a215b 0x4a215b is in nfs4_Compound (/root/ravi/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:734). 729 i + 1; 730 break; 731 } 732 } 733 734 status = (optabv4[opcode].funct) (&argarray[i], 735 &data, 736 &resarray[i]); 737 738 LogCompoundFH(&data); (gdb) l *0x47a269 0x47a269 is in nfs_rpc_execute (/root/ravi/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1281). 1276 &reqdata->r_u.req.svc.rq_xprt->blkin.endp, 1277 "export-id", 1278 (op_ctx->export != NULL) 1279 ? op_ctx->export->export_id : -1); 1280 #endif 1281 rc = reqdesc->service_function(arg_nfs, &reqdata->r_u.req.svc, 1282 res_nfs); 1283 1284 #ifdef USE_LTTNG 1285 tracepoint(nfs_rpc, op_end, reqdata); (gdb) > 0x604a00856c20 is located 1184 bytes inside of 1480-byte region > [0x604a00856780,0x604a00856d48) > freed by thread T259 here: > #0 0x7f9a82833009 (/lib64/libasan.so.0+0x16009) > #1 0x66b2ad (/usr/bin/ganesha.nfsd+0x66b2ad) > #2 0x66b32c (/usr/bin/ganesha.nfsd+0x66b32c) > #3 0x675139 (/usr/bin/ganesha.nfsd+0x675139) > #4 0x67a988 (/usr/bin/ganesha.nfsd+0x67a988) > #5 0x68543f (/usr/bin/ganesha.nfsd+0x68543f) > #6 0x44ddb5 (/usr/bin/ganesha.nfsd+0x44ddb5) > #7 0x4eb2e4 (/usr/bin/ganesha.nfsd+0x4eb2e4) > #8 0x4a215b (/usr/bin/ganesha.nfsd+0x4a215b) > #9 0x47a269 (/usr/bin/ganesha.nfsd+0x47a269) > #10 0x47b9e7 (/usr/bin/ganesha.nfsd+0x47b9e7) > #11 0x6257cf (/usr/bin/ganesha.nfsd+0x6257cf) > #12 0x7f9a82836a97 (/lib64/libasan.so.0+0x19a97) (gdb) l *0x66b2ad 0x66b2ad is in gsh_free (/root/ravi/nfs-ganesha/src/include/abstract_mem.h:271). 266 * @param[in] p Block of memory to free. 267 */ 268 static inline void 269 gsh_free(void *p) 270 { 271 free(p); 272 } 273 274 /** 275 * @brief Free a block of memory with size (gdb) l *0x66b32c 0x66b32c is in pool_free (/root/ravi/nfs-ganesha/src/include/abstract_mem.h:420). 415 */ 416 417 static inline void 418 pool_free(pool_t *pool, void *object) 419 { 420 gsh_free(object); 421 } 422 423 #endif /* ABSTRACT_MEM_H */ (gdb) l *0x675139 0x675139 is in mdcache_lru_unref (/root/ravi/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1456). 1451 1452 if (!qlocked) 1453 QUNLOCK(qlane); 1454 1455 mdcache_lru_clean(entry); 1456 pool_free(mdcache_entry_pool, entry); 1457 freed = true; 1458 1459 (void) atomic_dec_int64_t(&lru_state.entries_used); 1460 } /* refcnt == 0 */ (gdb) l *0x67a988 0x67a988 is in mdcache_put (/root/ravi/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.h:186). 181 * 182 * @param[in] entry Cache entry being returned 183 */ 184 static inline void mdcache_put(mdcache_entry_t *entry) 185 { 186 mdcache_lru_unref(entry, LRU_FLAG_NONE); 187 } 188 189 /** 190 * Return true if there are FDs available to serve open requests, (gdb) l *0x68543f 0x68543f is in mdcache_put_ref (/root/ravi/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1508). warning: Source file is more recent than executable. 1503 static void mdcache_put_ref(struct fsal_obj_handle *obj_hdl) 1504 { 1505 mdcache_entry_t *entry = 1506 container_of(obj_hdl, mdcache_entry_t, obj_handle); 1507 1508 mdcache_put(entry); 1509 } 1510 1511 /** 1512 * @brief Release an object handle (gdb) l *0x44ddb5 0x44ddb5 is in fsal_remove (/root/ravi/nfs-ganesha/src/FSAL/fsal_helper.c:1599). 1594 goto out; 1595 } 1596 1597 out: 1598 1599 to_remove_obj->obj_ops.put_ref(to_remove_obj); 1600 1601 out_no_obj: 1602 1603 LogFullDebug(COMPONENT_FSAL, "remove %s: status=%s", name, (gdb) l *0x4eb2e4 0x4eb2e4 is in nfs4_op_remove (/root/ravi/nfs-ganesha/src/Protocols/NFS/nfs4_op_remove.c:104). 99 sizeof(changeid4)); 100 101 res_REMOVE4->REMOVE4res_u.resok4.cinfo.before = 102 fsal_get_changeid4(parent_obj); 103 104 fsal_status = fsal_remove(parent_obj, name); 105 if (FSAL_IS_ERROR(fsal_status)) { 106 res_REMOVE4->status = nfs4_Errno_status(fsal_status); 107 goto out; 108 } (gdb) ^CQuit (gdb) This crash looks similar to the one reported in https://bugzilla.redhat.com/show_bug.cgi?id=1403666#c12
Its the same stack trace reported in other nodes as well.
Potential fix for this: https://review.gerrithub.io/308298
The reported issue was not reproducible on Ganesha 2.4.1-6,Gluster 3.8.4-12 on two tries. Will reopen if hit again during regressions.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0493.html