Bug 1466988 - [Ganesha] : Ganesha crashed on writes during __inode_destroy
[Ganesha] : Ganesha crashed on writes during __inode_destroy
Status: CLOSED WONTFIX
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha (Show other bugs)
3.3
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Kaleb KEITHLEY
Ambarish
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-01 02:13 EDT by Ambarish
Modified: 2017-08-10 03:09 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-10 03:09:10 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ambarish 2017-07-01 02:13:51 EDT
Description of problem:
-----------------------

2 node cluster,4 clients writing in their specific subdirectories (using Bonnie,dbench,kernel untar).

Ganesha crashed multiple times with the same BT on one of my nodes :

(gdb) bt
#0  __inode_ctx_free (inode=inode@entry=0x7f761000d6b0) at inode.c:331
#1  0x00007f78e3e69212 in __inode_destroy (inode=0x7f761000d6b0) at inode.c:353
#2  inode_table_prune (table=table@entry=0x7f78cc0f3410) at inode.c:1543
#3  0x00007f78e3e694f4 in inode_unref (inode=0x7f761000d6b0) at inode.c:524
#4  0x00007f78e3e58092 in loc_wipe (loc=loc@entry=0x7f77281212a0) at xlator.c:748
#5  0x00007f78d2ebc2bd in ec_fop_data_release (fop=0x7f7728120fe0) at ec-data.c:302
#6  0x00007f78d2ebec18 in ec_resume (fop=<optimized out>, error=<optimized out>) at ec-common.c:337
#7  0x00007f78d2ec0acb in ec_lock_assign_owner (link=0x7f7634175940) at ec-common.c:1710
#8  ec_lock (fop=0x7f76341758c0) at ec-common.c:1788
#9  0x00007f78d2ecc84f in ec_manager_opendir (fop=0x7f76341758c0, state=<optimized out>) at ec-dir-read.c:144
#10 0x00007f78d2ebea2b in __ec_manager (fop=0x7f76341758c0, error=0) at ec-common.c:2381
#11 0x00007f78d2eb867d in ec_gf_opendir (frame=<optimized out>, this=<optimized out>, loc=<optimized out>, 
    fd=<optimized out>, xdata=<optimized out>) at ec.c:952
#12 0x00007f78d2c693d7 in dht_opendir (frame=frame@entry=0x7f763401b070, this=this@entry=0x7f78cc03bd90, 
    loc=loc@entry=0x7f774c002080, fd=fd@entry=0x7f774c000f90, xdata=xdata@entry=0x7f763412f5d0) at dht-common.c:4960
#13 0x00007f78e3ed564b in default_opendir (frame=frame@entry=0x7f763401b070, this=this@entry=0x7f78cc03d6c0, 
    loc=loc@entry=0x7f774c002080, fd=fd@entry=0x7f774c000f90, xdata=xdata@entry=0x7f763412f5d0) at defaults.c:2956
#14 0x00007f78e3ed564b in default_opendir (frame=0x7f763401b070, this=<optimized out>, loc=0x7f774c002080, 
    fd=0x7f774c000f90, xdata=0x7f763412f5d0) at defaults.c:2956
#15 0x00007f78d2603453 in rda_opendir (frame=frame@entry=0x7f763402c850, this=this@entry=0x7f78cc040670, 
    loc=loc@entry=0x7f774c002080, fd=fd@entry=0x7f774c000f90, xdata=xdata@entry=0x7f763412f5d0)
    at readdir-ahead.c:570
#16 0x00007f78e3ed564b in default_opendir (frame=frame@entry=0x7f763402c850, this=this@entry=0x7f78cc0421a0, 
    loc=loc@entry=0x7f774c002080, fd=fd@entry=0x7f774c000f90, xdata=xdata@entry=0x7f763412f5d0) at defaults.c:2956
#17 0x00007f78e3ed564b in default_opendir (frame=frame@entry=0x7f763402c850, this=this@entry=0x7f78cc043a40, 
    loc=loc@entry=0x7f774c002080, fd=fd@entry=0x7f774c000f90, xdata=xdata@entry=0x7f763412f5d0) at defaults.c:2956
#18 0x00007f78e3ed564b in default_opendir (frame=0x7f763402c850, this=<optimized out>, loc=0x7f774c002080, 
    fd=0x7f774c000f90, xdata=0x7f763412f5d0) at defaults.c:2956
#19 0x00007f78d1dc6bb7 in mdc_opendir (frame=0x7f763419a630, this=0x7f78cc0465e0, loc=0x7f774c002080, 
    fd=0x7f774c000f90, xdata=0x7f763412f5d0) at md-cache.c:2322
#20 0x00007f78e3ef0b8a in default_opendir_resume (frame=0x7f774c010c00, this=0x7f78cc047f20, loc=0x7f774c002080, 
    fd=0x7f774c000f90, xdata=0x0) at defaults.c:2181
#21 0x00007f78e3e7d125 in call_resume (stub=0x7f774c002030) at call-stub.c:2508
#22 0x00007f78d1bbd957 in iot_worker (data=0x7f78cc057ad0) at io-threads.c:220
#23 0x00007f78e74c1e25 in start_thread (arg=0x7f78c01b7700) at pthread_create.c:308
#24 0x00007f78e6b8f34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) 


Version-Release number of selected component (if applicable):
--------------------------------------------------------------

nfs-ganesha-2.4.4-10.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64


How reproducible:
-----------------

Fairly


Additional info:
----------------

[root@gqas007 tmp]# gluster v info
 
Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 22c652d8-0754-438a-8131-373bad7c12ab
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (4 + 2) = 24
Transport-type: tcp
Bricks:
Brick1: gqas014.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick2: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick3: gqas014.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick4: gqas007.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick5: gqas014.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick6: gqas007.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick7: gqas014.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick8: gqas007.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick9: gqas014.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick10: gqas007.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick11: gqas014.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick12: gqas007.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick13: gqas014.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick14: gqas007.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick15: gqas014.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick16: gqas007.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick17: gqas014.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick18: gqas007.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick19: gqas014.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick20: gqas007.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick21: gqas014.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick22: gqas007.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick23: gqas014.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick24: gqas007.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Options Reconfigured:
ganesha.enable: on
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
Comment 6 Daniel Gryniewicz 2017-07-05 09:17:31 EDT
Could be memory corruption, could be a refcount error.  I'm not familiar enough with the Gluster codebase to know.

(Note, this entire backtrace in in Gluster code, no Ganesha code appears in it.)

Note You need to log in before you can comment on or make changes to this bug.