Bug 1685078 - systemic: gluster fuse crashed with segmentation fault possibly due to dentry not found
Summary: systemic: gluster fuse crashed with segmentation fault possibly due to dentry...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: ---
Assignee: Xavi Hernandez
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On:
Blocks: 1686009
TreeView+ depends on / blocked
 
Reported: 2019-03-04 10:23 UTC by Nag Pavan Chilakam
Modified: 2020-01-03 07:52 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1686009 (view as bug list)
Environment:
Last Closed: 2020-01-03 07:52:38 UTC
Embargoed:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2019-03-04 10:23:54 UTC
Description of problem:
=====================
was performing some non-functional tests for testing rpc fixes and found that one of the fuse clients crashed as below
[2019-03-01 13:37:02.398653] I [dict.c:471:dict_get] (-->/usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x6228d) [0x7f5147de128d] -->/usr/lib64/glusterfs/3.12.2/xlator/cluster/distribute.so(+0x202f7) [0x7f5147afc2f7] -->/lib64/libglusterfs.so.0(dict_get+0x10c) [0x7f515a5b2d3c] ) 13-dict: !this || key=trusted.glusterfs.dht.mds [Invalid argument]
[2019-03-01 13:37:02.711187] W [inode.c:197:__is_dentry_hashed] (-->/lib64/libglusterfs.so.0(__inode_path+0x68) [0x7f515a5cd1c8] -->/lib64/libglusterfs.so.0(+0x3add4) [0x7f515a5cadd4] -->/lib64/libglusterfs.so.0(+0x3ad7e) [0x7f515a5cad7e] ) 0-fuse: dentry not found
pending frames:
frame : type(1) op(UNLINK)
frame : type(1) op(UNLINK)
frame : type(1) op(READDIRP)
frame : type(1) op(OPEN)
frame : type(1) op(STAT)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-03-01 13:37:02
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7f515a5bbb9d]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f515a5c6114]
/lib64/libc.so.6(+0x36280)[0x7f5158bf8280]
/lib64/libglusterfs.so.0(+0x3adc2)[0x7f515a5cadc2]
/lib64/libglusterfs.so.0(__inode_path+0x68)[0x7f515a5cd1c8]
/lib64/libglusterfs.so.0(inode_path+0x31)[0x7f515a5cd551]
/lib64/libglusterfs.so.0(loc_touchup+0x7a)[0x7f515a5b9dba]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x6f8b)[0x7f515196df8b]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x7665)[0x7f515196e665]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x7bbd)[0x7f515196ebbd]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x7c8e)[0x7f515196ec8e]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x7cd0)[0x7f515196ecd0]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x1f702)[0x7f5151986702]
/lib64/libpthread.so.0(+0x7dd5)[0x7f51593f7dd5]
/lib64/libc.so.6(clone+0x6d)[0x7f5158cbfead]
---------



warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/73/3d1c681cfbd8bbeb11e8b7f80876a9aed6bb74
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --volfile-server=rhs-client19.lab.eng.blr.redhat.com --volf'.
Program terminated with signal 11, Segmentation fault.
#0  __dentry_search_arbit (inode=inode@entry=0x7f50ec000e98) at inode.c:1450
1450	        list_for_each_entry (trav, &inode->dentry_list, inode_list) {
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.3.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  __dentry_search_arbit (inode=inode@entry=0x7f50ec000e98) at inode.c:1450
#1  0x00007f515a5cd1c8 in __inode_path (inode=inode@entry=0x7f50dc01dfc8, name=name@entry=0x0, bufp=bufp@entry=0x7f5145ed6d30) at inode.c:1551
#2  0x00007f515a5cd551 in inode_path (inode=0x7f50dc01dfc8, name=name@entry=0x0, bufp=bufp@entry=0x7f5145ed6d30) at inode.c:1642
#3  0x00007f515a5b9dba in loc_touchup (loc=0x7f5069ee43c0, name=<optimized out>) at xlator.c:880
#4  0x00007f515196df8b in fuse_resolve_loc_touchup (state=0x7f5069ee43a0) at fuse-resolve.c:33
#5  fuse_resolve_continue (state=0x7f5069ee43a0) at fuse-resolve.c:704
#6  0x00007f515196e665 in fuse_resolve_inode (state=0x7f5069ee43a0) at fuse-resolve.c:364
#7  0x00007f515196ebbd in fuse_resolve (state=0x7f5069ee43a0) at fuse-resolve.c:651
#8  0x00007f515196ec8e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:679
#9  0x00007f515196ecd0 in fuse_resolve_and_resume (state=0x7f5069ee43a0, fn=0x7f5151972c10 <fuse_getattr_resume>) at fuse-resolve.c:718
#10 0x00007f5151986702 in fuse_thread_proc (data=0x563689a35f10) at fuse-bridge.c:5781
#11 0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f5158cbfead in clone () from /lib64/libc.so.6
(gdb) t a a bt

Thread 11 (Thread 0x7f515aaa6780 (LWP 4229)):
#0  0x00007f51593f8f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f515a61af78 in event_dispatch_epoll (event_pool=0x563689a2e250) at event-epoll.c:846
#2  0x0000563687bed538 in main (argc=4, argv=<optimized out>) at glusterfsd.c:2692

Thread 10 (Thread 0x7f51456d6700 (LWP 4257)):
#0  0x00007f51593fb965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f515197039b in timed_response_loop (data=0x563689a35f10) at fuse-bridge.c:4660
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f514cc97700 (LWP 4251)):
#0  0x00007f5158cc0483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f515a61a6e8 in event_dispatch_epoll_worker (data=0x563689a8bd60) at event-epoll.c:749
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f514d498700 (LWP 4250)):
#0  0x00007f5158cc0483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f515a61a6e8 in event_dispatch_epoll_worker (data=0x563689a8ba90) at event-epoll.c:749
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f5150163700 (LWP 4234)):
#0  0x00007f51593fbd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f515a5f7dd8 in syncenv_task (proc=proc@entry=0x563689a49b20) at syncop.c:603
#2  0x00007f515a5f8ca0 in syncenv_processor (thdata=0x563689a49b20) at syncop.c:695
#3  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f5151165700 (LWP 4231)):
#0  0x00007f51593ff361 in sigwait () from /lib64/libpthread.so.0
#1  0x0000563687bf0c7b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2242
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f51322ff700 (LWP 22247)):
#0  0x00007f51593fbd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f515a5f7dd8 in syncenv_task (proc=proc@entry=0x563689a4a2a0) at syncop.c:603
#2  0x00007f515a5f8ca0 in syncenv_processor (thdata=0x563689a4a2a0) at syncop.c:695
#3  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f5150964700 (LWP 4232)):
#0  0x00007f5158c86e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f5158c86cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f515a5e4b9d in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f5144ed5700 (LWP 4258)):
#0  0x00007f51593fb965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f51519708d3 in notify_kernel_loop (data=<optimized out>) at fuse-bridge.c:4584
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f5151966700 (LWP 4230)):
#0  0x00007f51593fee3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f515a5c9f86 in gf_timer_proc (data=0x563689a49300) at timer.c:174
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit--- 
Thread 1 (Thread 0x7f5145ed7700 (LWP 4256)):
#0  __dentry_search_arbit (inode=inode@entry=0x7f50ec000e98) at inode.c:1450
#1  0x00007f515a5cd1c8 in __inode_path (inode=inode@entry=0x7f50dc01dfc8, name=name@entry=0x0, bufp=bufp@entry=0x7f5145ed6d30) at inode.c:1551
#2  0x00007f515a5cd551 in inode_path (inode=0x7f50dc01dfc8, name=name@entry=0x0, bufp=bufp@entry=0x7f5145ed6d30) at inode.c:1642
#3  0x00007f515a5b9dba in loc_touchup (loc=0x7f5069ee43c0, name=<optimized out>) at xlator.c:880
#4  0x00007f515196df8b in fuse_resolve_loc_touchup (state=0x7f5069ee43a0) at fuse-resolve.c:33
#5  fuse_resolve_continue (state=0x7f5069ee43a0) at fuse-resolve.c:704
#6  0x00007f515196e665 in fuse_resolve_inode (state=0x7f5069ee43a0) at fuse-resolve.c:364
#7  0x00007f515196ebbd in fuse_resolve (state=0x7f5069ee43a0) at fuse-resolve.c:651
#8  0x00007f515196ec8e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:679
#9  0x00007f515196ecd0 in fuse_resolve_and_resume (state=0x7f5069ee43a0, fn=0x7f5151972c10 <fuse_getattr_resume>) at fuse-resolve.c:718
#10 0x00007f5151986702 in fuse_thread_proc (data=0x563689a35f10) at fuse-bridge.c:5781
#11 0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f5158cbfead in clone () from /lib64/libc.so.6
(gdb) q
[root@dhcp35-64 glusterfs]# rpm -qa|grep gluster
glusterfs-3.12.2-43.el7.x86_64
glusterfs-libs-3.12.2-43.el7.x86_64
glusterfs-client-xlators-3.12.2-43.el7.x86_64
glusterfs-fuse-3.12.2-43.el7.x86_64
glusterfs-debuginfo-3.12.2-43.el7.x86_64



################# on another client too hit same crash ##########
 sing host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --acl --volfile-server=rhs-client38.lab.eng.blr.redhat.com'.
Program terminated with signal 11, Segmentation fault.
#0  __dentry_search_arbit (inode=inode@entry=0x7f0a5002eab8) at inode.c:1450
1450	        list_for_each_entry (trav, &inode->dentry_list, inode_list) {
Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.12.2-43.el7.x86_64
(gdb) bt
#0  __dentry_search_arbit (inode=inode@entry=0x7f0a5002eab8) at inode.c:1450
#1  0x00007f0ac23a01c8 in __inode_path (inode=inode@entry=0x7f0a50009e68, 
    name=name@entry=0x7f0a028f1b50 "dhcp35-37.lab.eng.blr.redhat.com-fresh_top.log", bufp=bufp@entry=0x7f0985ba36e8)
    at inode.c:1551
#2  0x00007f0ac23a0551 in inode_path (inode=0x7f0a50009e68, 
    name=0x7f0a028f1b50 "dhcp35-37.lab.eng.blr.redhat.com-fresh_top.log", bufp=bufp@entry=0x7f0985ba36e8)
    at inode.c:1642
#3  0x00007f0ab9740489 in fuse_resolve_entry (state=0x7f0985ba3570) at fuse-resolve.c:99
#4  0x00007f0ab974162d in fuse_resolve_parent (state=state@entry=0x7f0985ba3570) at fuse-resolve.c:312
#5  0x00007f0ab9741998 in fuse_resolve (state=0x7f0985ba3570) at fuse-resolve.c:647
#6  0x00007f0ab9741c8e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:679
#7  0x00007f0ab9741cd0 in fuse_resolve_and_resume (state=0x7f0985ba3570, fn=0x7f0ab9744e40 <fuse_lookup_resume>)
    at fuse-resolve.c:718
---Type <return> to continue, or q <return> to quit---
#8  0x00007f0ab9759702 in fuse_thread_proc (data=0x559b1ebcf0c0) at fuse-bridge.c:5781
#9  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f0ac0a92ead in clone () from /lib64/libc.so.6
(gdb) t a a bt

Thread 11 (Thread 0x7f0ac2879780 (LWP 4246)):
#0  0x00007f0ac11cbf47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f0ac23edf78 in event_dispatch_epoll (event_pool=0x559b1ebc7250) at event-epoll.c:846
#2  0x0000559b1de33538 in main (argc=5, argv=<optimized out>) at glusterfsd.c:2692

Thread 10 (Thread 0x7f0aad2ae700 (LWP 4260)):
#0  0x00007f0ac11ce965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f0ab974339b in timed_response_loop (data=0x559b1ebcf0c0) at fuse-bridge.c:4660
#2  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f0ab7f36700 (LWP 4251)):
---Type <return> to continue, or q <return> to quit---
#0  0x00007f0ac11ced12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f0ac23cadd8 in syncenv_task (proc=proc@entry=0x559b1ebe2e40) at syncop.c:603
#2  0x00007f0ac23cbca0 in syncenv_processor (thdata=0x559b1ebe2e40) at syncop.c:695
#3  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f0ab8737700 (LWP 4250)):
#0  0x00007f0ac0a59e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f0ac0a59cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f0ac23b7b9d in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 7 (Thread 0x7f0ab9739700 (LWP 4248)):
#0  0x00007f0ac11d1e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f0ac239cf86 in gf_timer_proc (data=0x559b1ebe2620) at timer.c:174
#2  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f0ab4a6a700 (LWP 4256)):
#0  0x00007f0ac11d14ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f0ac11ccdcb in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f0ac11ccc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f0ac239eee9 in inode_unref (inode=0x7f09850f0d98) at inode.c:668
#4  0x00007f0ac238ca02 in loc_wipe (loc=loc@entry=0x7f09865af178) at xlator.c:768
#5  0x00007f0ab40231ee in client_local_wipe (local=local@entry=0x7f09865af178) at client-helpers.c:127
---Type <return> to continue, or q <return> to quit---
#6  0x00007f0ab4032f0d in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, 
    myframe=0x7f098676f668) at client-rpc-fops.c:2872
#7  0x00007f0ac2134a00 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f097ca19cb0, pollin=pollin@entry=0x7f097ccffd50)
    at rpc-clnt.c:778
#8  0x00007f0ac2134d6b in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f097ca19ce0, event=<optimized out>, 
    data=0x7f097ccffd50) at rpc-clnt.c:971
#9  0x00007f0ac2130ae3 in rpc_transport_notify (this=this@entry=0x7f098e451610, 
    event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f097ccffd50) at rpc-transport.c:557
#10 0x00007f0ab6d22586 in socket_event_poll_in (this=this@entry=0x7f098e451610, notify_handled=<optimized out>)
    at socket.c:2322
#11 0x00007f0ab6d24bca in socket_event_handler (fd=33, idx=26, gen=4, data=0x7f098e451610, poll_in=<optimized out>, 
    poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2482
#12 0x00007f0ac23ed870 in event_dispatch_epoll_handler (event=0x7f0ab4a69e70, event_pool=0x559b1ebc7250)
---Type <return> to continue, or q <return> to quit---
    at event-epoll.c:643
#13 event_dispatch_epoll_worker (data=0x559b1ec250a0) at event-epoll.c:759
#14 0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f0ab526b700 (LWP 4255)):
#0  0x00007f0ac11d14ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f0ac11ccdcb in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f0ac11ccc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f0ac23a1819 in inode_is_linked (inode=inode@entry=0x7f0a4c03b8f8) at inode.c:2490
#4  0x00007f0aafdd2693 in afr_read_subvol_select_by_policy (inode=inode@entry=0x7f0a4c03b8f8, 
    this=this@entry=0x7f097d715a70, readable=readable@entry=0x7f0ab526a420 "\001\001\001\265\n\177", 
    args=args@entry=0x0) at afr-common.c:1685
---Type <return> to continue, or q <return> to quit---
#5  0x00007f0aafdd29d6 in afr_read_subvol_get (inode=inode@entry=0x7f0a4c03b8f8, this=0x7f097d715a70, 
    subvol_p=subvol_p@entry=0x0, readables=readables@entry=0x0, event_p=event_p@entry=0x0, 
    type=type@entry=AFR_DATA_TRANSACTION, args=args@entry=0x0) at afr-common.c:1771
#6  0x00007f0aafde1050 in afr_get_parent_read_subvol (readable=0x7f0ab526a540 "\001\001\001|\t\177", 
    replies=<optimized out>, parent=0x7f0a4c03b8f8, this=<optimized out>) at afr-common.c:2167
#7  afr_lookup_done (frame=frame@entry=0x7f098664dcb8, this=this@entry=0x7f097d715a70) at afr-common.c:2319
#8  0x00007f0aafde2058 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7f098664dcb8, 
    this=this@entry=0x7f097d715a70) at afr-common.c:2771
#9  0x00007f0aafde2a5b in afr_lookup_entry_heal (frame=frame@entry=0x7f098664dcb8, this=this@entry=0x7f097d715a70)
    at afr-common.c:2920
#10 0x00007f0aafde2e3d in afr_lookup_cbk (frame=frame@entry=0x7f098664dcb8, cookie=<optimized out>, 
    this=0x7f097d715a70, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7f09850f0d98, 
    buf=buf@entry=0x7f0ab526a900, xdata=0x7f0993075e88, postparent=postparent@entry=0x7f0ab526a970)
---Type <return> to continue, or q <return> to quit---
    at afr-common.c:2968
#11 0x00007f0ab4032efd in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, 
    myframe=0x7f0985df1b58) at client-rpc-fops.c:2872
#12 0x00007f0ac2134a00 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f097c49ee60, pollin=pollin@entry=0x7f09937ef6c0)
    at rpc-clnt.c:778
#13 0x00007f0ac2134d6b in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f097c49ee90, event=<optimized out>, 
    data=0x7f09937ef6c0) at rpc-clnt.c:971
#14 0x00007f0ac2130ae3 in rpc_transport_notify (this=this@entry=0x7f097e3ea6a0, 
    event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f09937ef6c0) at rpc-transport.c:557
#15 0x00007f0ab6d22586 in socket_event_poll_in (this=this@entry=0x7f097e3ea6a0, notify_handled=<optimized out>)
    at socket.c:2322
#16 0x00007f0ab6d24bca in socket_event_handler (fd=29, idx=20, gen=7, data=0x7f097e3ea6a0, poll_in=<optimized out>, 
    poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2482
---Type <return> to continue, or q <return> to quit---
#17 0x00007f0ac23ed870 in event_dispatch_epoll_handler (event=0x7f0ab526ae70, event_pool=0x559b1ebc7250)
    at event-epoll.c:643
#18 event_dispatch_epoll_worker (data=0x559b1ec24dd0) at event-epoll.c:759
#19 0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f0ab7735700 (LWP 4252)):
#0  0x00007f0ac11ced12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f0ac23cadd8 in syncenv_task (proc=proc@entry=0x559b1ebe3200) at syncop.c:603
#2  0x00007f0ac23cbca0 in syncenv_processor (thdata=0x559b1ebe3200) at syncop.c:695
#3  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 3 (Thread 0x7f0aacaad700 (LWP 4261)):
#0  0x00007f0ac11ce965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f0ab97438d3 in notify_kernel_loop (data=<optimized out>) at fuse-bridge.c:4584
#2  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f0ab8f38700 (LWP 4249)):
#0  0x00007f0ac11d2361 in sigwait () from /lib64/libpthread.so.0
#1  0x0000559b1de36c7b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2242
#2  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f0aadaaf700 (LWP 4259)):
---Type <return> to continue, or q <return> to quit---
#0  __dentry_search_arbit (inode=inode@entry=0x7f0a5002eab8) at inode.c:1450
#1  0x00007f0ac23a01c8 in __inode_path (inode=inode@entry=0x7f0a50009e68, 
    name=name@entry=0x7f0a028f1b50 "dhcp35-37.lab.eng.blr.redhat.com-fresh_top.log", bufp=bufp@entry=0x7f0985ba36e8)
    at inode.c:1551
#2  0x00007f0ac23a0551 in inode_path (inode=0x7f0a50009e68, 
    name=0x7f0a028f1b50 "dhcp35-37.lab.eng.blr.redhat.com-fresh_top.log", bufp=bufp@entry=0x7f0985ba36e8)
    at inode.c:1642
#3  0x00007f0ab9740489 in fuse_resolve_entry (state=0x7f0985ba3570) at fuse-resolve.c:99
#4  0x00007f0ab974162d in fuse_resolve_parent (state=state@entry=0x7f0985ba3570) at fuse-resolve.c:312
#5  0x00007f0ab9741998 in fuse_resolve (state=0x7f0985ba3570) at fuse-resolve.c:647
#6  0x00007f0ab9741c8e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:679
#7  0x00007f0ab9741cd0 in fuse_resolve_and_resume (state=0x7f0985ba3570, fn=0x7f0ab9744e40 <fuse_lookup_resume>)
    at fuse-resolve.c:718
---Type <return> to continue, or q <return> to quit---
#8  0x00007f0ab9759702 in fuse_thread_proc (data=0x559b1ebcf0c0) at fuse-bridge.c:5781
#9  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f0ac0a92ead in clone () from /lib64/libc.so.6
(gdb) 



Version-Release number of selected component (if applicable):
=================
3.12.2-43

How reproducible:
==================
hit it on 2 clients


Steps to Reproduce:
==================
was running tests to stress the system for RPC bug fixes
more details available @ https://docs.google.com/spreadsheets/d/17Yf9ZRWnWOpbRyFQ2ZYxAAlp9I_yarzKZdjN8idBJM0/edit#gid=1472913705

1. 3x3 volume
2. IOs triggered from 8 different clients both as root and non root user for about a week, with quotas/uss enabled and set after 2 days
3. after about a week, did a add-brick with 3 replica sets to make it 6x3 and triggered rebalance and left it over weekend



Actual results:
===========
saw the crash on 2 clients

Comment 5 Amar Tumballi 2019-03-04 12:47:00 UTC
> [2019-03-01 13:37:02.711187] W [inode.c:197:__is_dentry_hashed] (-->/lib64/libglusterfs.so.0(__inode_path+0x68) [0x7f515a5cd1c8] -->/lib64/libglusterfs.so.0(+0x3add4) [0x7f515a5cadd4] -->/lib64/libglusterfs.so.0(+0x3ad7e) [0x7f515a5cad7e] ) 0-fuse: dentry not found
> pending frames:
> frame : type(1) op(UNLINK)
> frame : type(1) op(UNLINK)
> frame : type(1) op(READDIRP)

----
(gdb) bt
#0  __dentry_search_arbit (inode=inode@entry=0x7f0a5002eab8) at inode.c:1450
(gdb) l
1445            dentry_t *trav = NULL;
1446
1447            if (!inode)
1448                    return NULL;
1449
1450            list_for_each_entry (trav, &inode->dentry_list, inode_list) {
1451                    if (__is_dentry_hashed (trav)) {
1452                            dentry = trav;
1453                            break;

----

Looks like we need to see if trav is null, and break the loop. Mainly here, __is_dentry_hashed() has given 0 output, and we still continue to traverse the list. I guess, that should have stopped.

Still checking.

Comment 22 Amar Tumballi 2019-03-26 12:11:33 UTC
We will mark this for release once there is a fix upstream.

Comment 28 Nag Pavan Chilakam 2019-06-20 11:26:43 UTC
cleared needinfo accidentally


Note You need to log in before you can comment on or make changes to this bug.