Description of problem: ----------------------- This is different than https://bugzilla.redhat.com/show_bug.cgi?id=1466988. Not just coz the BTs are different(one is EC , one is AFR),but also I think https://bugzilla.redhat.com/show_bug.cgi?id=1466988 happened in opendir path. Plz feel free to close as DUP,if that's not the case. Use Case- 2 node cluster,4 clients writing in their specific subdirectories (using Bonnie,dbench,kernel untar). Ganesha crashed on one of my nodes with the following BT : (gdb) bt #0 __inode_ctx_free (inode=inode@entry=0x7f2f440025c0) at inode.c:331 #1 0x00007f32108651d2 in __inode_destroy (inode=0x7f2f440025c0) at inode.c:353 #2 inode_table_prune (table=table@entry=0x7f31f806afc0) at inode.c:1543 #3 0x00007f32108654b4 in inode_unref (inode=0x7f2f440025c0) at inode.c:524 #4 0x00007f31ff90d4a5 in afr_local_cleanup (local=0x7f31f81449c0, this=<optimized out>) at afr-common.c:1790 #5 0x00007f31ff8ea0fc in __afr_txn_write_done (frame=<optimized out>, this=<optimized out>) at afr-transaction.c:198 #6 0x00007f31ff8ef0eb in afr_unlock_common_cbk (frame=frame@entry=0x7f31f82704e0, this=this@entry=0x7f31f8011ce0, xdata=0x0, op_errno=<optimized out>, op_ret=<optimized out>, cookie=<optimized out>) at afr-lk-common.c:633 #7 0x00007f31ff8ef9e4 in afr_unlock_entrylk_cbk (frame=0x7f31f82704e0, cookie=<optimized out>, this=0x7f31f8011ce0, op_ret=<optimized out>, op_errno=<optimized out>, xdata=<optimized out>) at afr-lk-common.c:829 #8 0x00007f31ffb526eb in client3_3_entrylk_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f31ec0f0760) at client-rpc-fops.c:1657 #9 0x00007f321061f840 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f31f80570e0, pollin=pollin@entry=0x7f31e8002440) at rpc-clnt.c:794 #10 0x00007f321061fb27 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f31f8057110, event=<optimized out>, data=0x7f31e8002440) at rpc-clnt.c:987 #11 0x00007f321061b9e3 in rpc_transport_notify (this=this@entry=0x7f31f8057280, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f31e8002440) at rpc-transport.c:538 #12 0x00007f32040953d6 in socket_event_poll_in (this=this@entry=0x7f31f8057280, notify_handled=<optimized out>) at socket.c:2306 #13 0x00007f320409797c in socket_event_handler (fd=10, idx=5, gen=1, data=0x7f31f8057280, poll_in=1, poll_out=0, poll_err=0) at socket.c:2458 #14 0x00007f32108b0776 in event_dispatch_epoll_handler (event=0x7f31fdaa7540, event_pool=0x5601b972c010) at event-epoll.c:572 #15 event_dispatch_epoll_worker (data=0x7f31f80524c0) at event-epoll.c:648 #16 0x00007f3213ebde25 in start_thread () from /lib64/libpthread.so.0 #17 0x00007f321358b34d in clone () from /lib64/libc.so.6 (gdb) Version-Release number of selected component (if applicable): -------------------------------------------------------------- nfs-ganesha-gluster-2.4.4-10.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-32.el7rhgs.x86_64 How reproducible: ----------------- Reporting the first occurence. Additional info: ---------------- Volume Name: testvol Type: Distributed-Replicate Volume ID: 6ade5657-45e2-43c7-8098-774417789a5e Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0 Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1 Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2 Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3 Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on client.event-threads: 4 server.event-threads: 4 cluster.lookup-optimize: on ganesha.enable: on features.cache-invalidation: on server.allow-insecure: on performance.stat-prefetch: off transport.address-family: inet nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable [root@gqas005 tmp]#
So, this backtrace does not include any Ganesha code at all, it's entirely in Gluster code. That said, if it's memory corruption, it's likely the same issue as the rest.