Bug 1467016 - [Ganesha] : Ganesha crashes on writes (AFR),possible memory corruption
[Ganesha] : Ganesha crashes on writes (AFR),possible memory corruption
Status: CLOSED WONTFIX
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha (Show other bugs)
3.3
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Kaleb KEITHLEY
Ambarish
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-01 13:32 EDT by Ambarish
Modified: 2017-08-10 03:09 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-10 03:09:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ambarish 2017-07-01 13:32:12 EDT
Description of problem:
-----------------------

This is different than https://bugzilla.redhat.com/show_bug.cgi?id=1466988.

Not just coz the BTs are different(one is EC , one is AFR),but also I think https://bugzilla.redhat.com/show_bug.cgi?id=1466988 happened in opendir path. 

Plz feel free to close as DUP,if that's not the case.

Use Case- 2 node cluster,4 clients writing in their specific subdirectories (using Bonnie,dbench,kernel untar).

Ganesha crashed on one of my nodes with the following BT :

(gdb) bt
#0  __inode_ctx_free (inode=inode@entry=0x7f2f440025c0) at inode.c:331
#1  0x00007f32108651d2 in __inode_destroy (inode=0x7f2f440025c0) at inode.c:353
#2  inode_table_prune (table=table@entry=0x7f31f806afc0) at inode.c:1543
#3  0x00007f32108654b4 in inode_unref (inode=0x7f2f440025c0) at inode.c:524
#4  0x00007f31ff90d4a5 in afr_local_cleanup (local=0x7f31f81449c0, this=<optimized out>) at afr-common.c:1790
#5  0x00007f31ff8ea0fc in __afr_txn_write_done (frame=<optimized out>, this=<optimized out>) at afr-transaction.c:198
#6  0x00007f31ff8ef0eb in afr_unlock_common_cbk (frame=frame@entry=0x7f31f82704e0, this=this@entry=0x7f31f8011ce0, xdata=0x0, op_errno=<optimized out>, op_ret=<optimized out>, cookie=<optimized out>)
    at afr-lk-common.c:633
#7  0x00007f31ff8ef9e4 in afr_unlock_entrylk_cbk (frame=0x7f31f82704e0, cookie=<optimized out>, this=0x7f31f8011ce0, op_ret=<optimized out>, op_errno=<optimized out>, xdata=<optimized out>)
    at afr-lk-common.c:829
#8  0x00007f31ffb526eb in client3_3_entrylk_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f31ec0f0760) at client-rpc-fops.c:1657
#9  0x00007f321061f840 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f31f80570e0, pollin=pollin@entry=0x7f31e8002440) at rpc-clnt.c:794
#10 0x00007f321061fb27 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f31f8057110, event=<optimized out>, data=0x7f31e8002440) at rpc-clnt.c:987
#11 0x00007f321061b9e3 in rpc_transport_notify (this=this@entry=0x7f31f8057280, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f31e8002440) at rpc-transport.c:538
#12 0x00007f32040953d6 in socket_event_poll_in (this=this@entry=0x7f31f8057280, notify_handled=<optimized out>) at socket.c:2306
#13 0x00007f320409797c in socket_event_handler (fd=10, idx=5, gen=1, data=0x7f31f8057280, poll_in=1, poll_out=0, poll_err=0) at socket.c:2458
#14 0x00007f32108b0776 in event_dispatch_epoll_handler (event=0x7f31fdaa7540, event_pool=0x5601b972c010) at event-epoll.c:572
#15 event_dispatch_epoll_worker (data=0x7f31f80524c0) at event-epoll.c:648
#16 0x00007f3213ebde25 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f321358b34d in clone () from /lib64/libc.so.6
(gdb) 




Version-Release number of selected component (if applicable):
--------------------------------------------------------------

nfs-ganesha-gluster-2.4.4-10.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-32.el7rhgs.x86_64


How reproducible:
-----------------

Reporting the first occurence.


Additional info:
----------------

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 6ade5657-45e2-43c7-8098-774417789a5e
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
ganesha.enable: on
features.cache-invalidation: on
server.allow-insecure: on
performance.stat-prefetch: off
transport.address-family: inet
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas005 tmp]#
Comment 4 Daniel Gryniewicz 2017-07-05 09:03:06 EDT
So, this backtrace does not include any Ganesha code at all, it's entirely in Gluster code.  That said, if it's memory corruption, it's likely the same issue as the rest.

Note You need to log in before you can comment on or make changes to this bug.