Bug 1686009

Summary: gluster fuse crashed with segmentation fault possibly due to dentry not found
Product: [Community] GlusterFS Reporter: Amar Tumballi <atumball>
Component: coreAssignee: bugs <bugs>
Status: CLOSED WORKSFORME QA Contact:
Severity: urgent Docs Contact:
Priority: medium    
Version: mainlineCC: bmekala, bugs, jahernan, nbalacha, nchilaka, pasik, pkarampu, rcyriac, rgowdapp, rhs-bugs, sankarshan, srangana, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1685078 Environment:
Last Closed: 2019-07-15 08:56:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1685078    
Bug Blocks:    

Description Amar Tumballi 2019-03-06 14:25:09 UTC
+++ This bug was initially created as a clone of Bug #1685078 +++

Description of problem:
=====================
was performing some non-functional tests for testing rpc fixes and found that one of the fuse clients crashed as below
[2019-03-01 13:37:02.398653] I [dict.c:471:dict_get] (-->/usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x6228d) [0x7f5147de128d] -->/usr/lib64/glusterfs/3.12.2/xlator/cluster/distribute.so(+0x202f7) [0x7f5147afc2f7] -->/lib64/libglusterfs.so.0(dict_get+0x10c) [0x7f515a5b2d3c] ) 13-dict: !this || key=trusted.glusterfs.dht.mds [Invalid argument]
[2019-03-01 13:37:02.711187] W [inode.c:197:__is_dentry_hashed] (-->/lib64/libglusterfs.so.0(__inode_path+0x68) [0x7f515a5cd1c8] -->/lib64/libglusterfs.so.0(+0x3add4) [0x7f515a5cadd4] -->/lib64/libglusterfs.so.0(+0x3ad7e) [0x7f515a5cad7e] ) 0-fuse: dentry not found
pending frames:
frame : type(1) op(UNLINK)
frame : type(1) op(UNLINK)
frame : type(1) op(READDIRP)
frame : type(1) op(OPEN)
frame : type(1) op(STAT)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-03-01 13:37:02
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7f515a5bbb9d]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f515a5c6114]
/lib64/libc.so.6(+0x36280)[0x7f5158bf8280]
/lib64/libglusterfs.so.0(+0x3adc2)[0x7f515a5cadc2]
/lib64/libglusterfs.so.0(__inode_path+0x68)[0x7f515a5cd1c8]
/lib64/libglusterfs.so.0(inode_path+0x31)[0x7f515a5cd551]
/lib64/libglusterfs.so.0(loc_touchup+0x7a)[0x7f515a5b9dba]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x6f8b)[0x7f515196df8b]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x7665)[0x7f515196e665]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x7bbd)[0x7f515196ebbd]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x7c8e)[0x7f515196ec8e]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x7cd0)[0x7f515196ecd0]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x1f702)[0x7f5151986702]
/lib64/libpthread.so.0(+0x7dd5)[0x7f51593f7dd5]
/lib64/libc.so.6(clone+0x6d)[0x7f5158cbfead]
---------



warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/73/3d1c681cfbd8bbeb11e8b7f80876a9aed6bb74
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --volfile-server=my-machine.redhat.com --volf'.
Program terminated with signal 11, Segmentation fault.
#0  __dentry_search_arbit (inode=inode@entry=0x7f50ec000e98) at inode.c:1450
1450	        list_for_each_entry (trav, &inode->dentry_list, inode_list) {
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.3.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  __dentry_search_arbit (inode=inode@entry=0x7f50ec000e98) at inode.c:1450
#1  0x00007f515a5cd1c8 in __inode_path (inode=inode@entry=0x7f50dc01dfc8, name=name@entry=0x0, bufp=bufp@entry=0x7f5145ed6d30) at inode.c:1551
#2  0x00007f515a5cd551 in inode_path (inode=0x7f50dc01dfc8, name=name@entry=0x0, bufp=bufp@entry=0x7f5145ed6d30) at inode.c:1642
#3  0x00007f515a5b9dba in loc_touchup (loc=0x7f5069ee43c0, name=<optimized out>) at xlator.c:880
#4  0x00007f515196df8b in fuse_resolve_loc_touchup (state=0x7f5069ee43a0) at fuse-resolve.c:33
#5  fuse_resolve_continue (state=0x7f5069ee43a0) at fuse-resolve.c:704
#6  0x00007f515196e665 in fuse_resolve_inode (state=0x7f5069ee43a0) at fuse-resolve.c:364
#7  0x00007f515196ebbd in fuse_resolve (state=0x7f5069ee43a0) at fuse-resolve.c:651
#8  0x00007f515196ec8e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:679
#9  0x00007f515196ecd0 in fuse_resolve_and_resume (state=0x7f5069ee43a0, fn=0x7f5151972c10 <fuse_getattr_resume>) at fuse-resolve.c:718
#10 0x00007f5151986702 in fuse_thread_proc (data=0x563689a35f10) at fuse-bridge.c:5781
#11 0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f5158cbfead in clone () from /lib64/libc.so.6
(gdb) t a a bt

Thread 11 (Thread 0x7f515aaa6780 (LWP 4229)):
#0  0x00007f51593f8f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f515a61af78 in event_dispatch_epoll (event_pool=0x563689a2e250) at event-epoll.c:846
#2  0x0000563687bed538 in main (argc=4, argv=<optimized out>) at glusterfsd.c:2692

Thread 10 (Thread 0x7f51456d6700 (LWP 4257)):
#0  0x00007f51593fb965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f515197039b in timed_response_loop (data=0x563689a35f10) at fuse-bridge.c:4660
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f514cc97700 (LWP 4251)):
#0  0x00007f5158cc0483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f515a61a6e8 in event_dispatch_epoll_worker (data=0x563689a8bd60) at event-epoll.c:749
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f514d498700 (LWP 4250)):
#0  0x00007f5158cc0483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f515a61a6e8 in event_dispatch_epoll_worker (data=0x563689a8ba90) at event-epoll.c:749
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f5150163700 (LWP 4234)):
#0  0x00007f51593fbd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f515a5f7dd8 in syncenv_task (proc=proc@entry=0x563689a49b20) at syncop.c:603
#2  0x00007f515a5f8ca0 in syncenv_processor (thdata=0x563689a49b20) at syncop.c:695
#3  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f5151165700 (LWP 4231)):
#0  0x00007f51593ff361 in sigwait () from /lib64/libpthread.so.0
#1  0x0000563687bf0c7b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2242
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f51322ff700 (LWP 22247)):
#0  0x00007f51593fbd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f515a5f7dd8 in syncenv_task (proc=proc@entry=0x563689a4a2a0) at syncop.c:603
#2  0x00007f515a5f8ca0 in syncenv_processor (thdata=0x563689a4a2a0) at syncop.c:695
#3  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f5150964700 (LWP 4232)):
#0  0x00007f5158c86e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f5158c86cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f515a5e4b9d in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f5144ed5700 (LWP 4258)):
#0  0x00007f51593fb965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f51519708d3 in notify_kernel_loop (data=<optimized out>) at fuse-bridge.c:4584
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f5151966700 (LWP 4230)):
#0  0x00007f51593fee3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f515a5c9f86 in gf_timer_proc (data=0x563689a49300) at timer.c:174
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit--- 
Thread 1 (Thread 0x7f5145ed7700 (LWP 4256)):
#0  __dentry_search_arbit (inode=inode@entry=0x7f50ec000e98) at inode.c:1450
#1  0x00007f515a5cd1c8 in __inode_path (inode=inode@entry=0x7f50dc01dfc8, name=name@entry=0x0, bufp=bufp@entry=0x7f5145ed6d30) at inode.c:1551
#2  0x00007f515a5cd551 in inode_path (inode=0x7f50dc01dfc8, name=name@entry=0x0, bufp=bufp@entry=0x7f5145ed6d30) at inode.c:1642
#3  0x00007f515a5b9dba in loc_touchup (loc=0x7f5069ee43c0, name=<optimized out>) at xlator.c:880
#4  0x00007f515196df8b in fuse_resolve_loc_touchup (state=0x7f5069ee43a0) at fuse-resolve.c:33
#5  fuse_resolve_continue (state=0x7f5069ee43a0) at fuse-resolve.c:704
#6  0x00007f515196e665 in fuse_resolve_inode (state=0x7f5069ee43a0) at fuse-resolve.c:364
#7  0x00007f515196ebbd in fuse_resolve (state=0x7f5069ee43a0) at fuse-resolve.c:651
#8  0x00007f515196ec8e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:679
#9  0x00007f515196ecd0 in fuse_resolve_and_resume (state=0x7f5069ee43a0, fn=0x7f5151972c10 <fuse_getattr_resume>) at fuse-resolve.c:718
#10 0x00007f5151986702 in fuse_thread_proc (data=0x563689a35f10) at fuse-bridge.c:5781
#11 0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f5158cbfead in clone () from /lib64/libc.so.6
(gdb) q



################# on another client too hit same crash ##########
 sing host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --acl --volfile-server=my-machine.redhat.com'.
Program terminated with signal 11, Segmentation fault.
#0  __dentry_search_arbit (inode=inode@entry=0x7f0a5002eab8) at inode.c:1450
1450	        list_for_each_entry (trav, &inode->dentry_list, inode_list) {
Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.12.2-43.el7.x86_64
(gdb) bt
#0  __dentry_search_arbit (inode=inode@entry=0x7f0a5002eab8) at inode.c:1450
#1  0x00007f0ac23a01c8 in __inode_path (inode=inode@entry=0x7f0a50009e68, 
    name=name@entry=0x7f0a028f1b50 "fresh_top.log", bufp=bufp@entry=0x7f0985ba36e8)
    at inode.c:1551
#2  0x00007f0ac23a0551 in inode_path (inode=0x7f0a50009e68, 
    name=0x7f0a028f1b50 "fresh_top.log", bufp=bufp@entry=0x7f0985ba36e8)
    at inode.c:1642
#3  0x00007f0ab9740489 in fuse_resolve_entry (state=0x7f0985ba3570) at fuse-resolve.c:99
#4  0x00007f0ab974162d in fuse_resolve_parent (state=state@entry=0x7f0985ba3570) at fuse-resolve.c:312
#5  0x00007f0ab9741998 in fuse_resolve (state=0x7f0985ba3570) at fuse-resolve.c:647
#6  0x00007f0ab9741c8e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:679
#7  0x00007f0ab9741cd0 in fuse_resolve_and_resume (state=0x7f0985ba3570, fn=0x7f0ab9744e40 <fuse_lookup_resume>)
    at fuse-resolve.c:718
---Type <return> to continue, or q <return> to quit---
#8  0x00007f0ab9759702 in fuse_thread_proc (data=0x559b1ebcf0c0) at fuse-bridge.c:5781
#9  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f0ac0a92ead in clone () from /lib64/libc.so.6
(gdb) t a a bt

Thread 11 (Thread 0x7f0ac2879780 (LWP 4246)):
#0  0x00007f0ac11cbf47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f0ac23edf78 in event_dispatch_epoll (event_pool=0x559b1ebc7250) at event-epoll.c:846
#2  0x0000559b1de33538 in main (argc=5, argv=<optimized out>) at glusterfsd.c:2692

Thread 10 (Thread 0x7f0aad2ae700 (LWP 4260)):
#0  0x00007f0ac11ce965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f0ab974339b in timed_response_loop (data=0x559b1ebcf0c0) at fuse-bridge.c:4660
#2  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f0ab7f36700 (LWP 4251)):
---Type <return> to continue, or q <return> to quit---
#0  0x00007f0ac11ced12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f0ac23cadd8 in syncenv_task (proc=proc@entry=0x559b1ebe2e40) at syncop.c:603
#2  0x00007f0ac23cbca0 in syncenv_processor (thdata=0x559b1ebe2e40) at syncop.c:695
#3  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f0ab8737700 (LWP 4250)):
#0  0x00007f0ac0a59e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f0ac0a59cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f0ac23b7b9d in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 7 (Thread 0x7f0ab9739700 (LWP 4248)):
#0  0x00007f0ac11d1e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f0ac239cf86 in gf_timer_proc (data=0x559b1ebe2620) at timer.c:174
#2  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f0ab4a6a700 (LWP 4256)):
#0  0x00007f0ac11d14ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f0ac11ccdcb in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f0ac11ccc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f0ac239eee9 in inode_unref (inode=0x7f09850f0d98) at inode.c:668
#4  0x00007f0ac238ca02 in loc_wipe (loc=loc@entry=0x7f09865af178) at xlator.c:768
#5  0x00007f0ab40231ee in client_local_wipe (local=local@entry=0x7f09865af178) at client-helpers.c:127
---Type <return> to continue, or q <return> to quit---
#6  0x00007f0ab4032f0d in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, 
    myframe=0x7f098676f668) at client-rpc-fops.c:2872
#7  0x00007f0ac2134a00 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f097ca19cb0, pollin=pollin@entry=0x7f097ccffd50)
    at rpc-clnt.c:778
#8  0x00007f0ac2134d6b in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f097ca19ce0, event=<optimized out>, 
    data=0x7f097ccffd50) at rpc-clnt.c:971
#9  0x00007f0ac2130ae3 in rpc_transport_notify (this=this@entry=0x7f098e451610, 
    event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f097ccffd50) at rpc-transport.c:557
#10 0x00007f0ab6d22586 in socket_event_poll_in (this=this@entry=0x7f098e451610, notify_handled=<optimized out>)
    at socket.c:2322
#11 0x00007f0ab6d24bca in socket_event_handler (fd=33, idx=26, gen=4, data=0x7f098e451610, poll_in=<optimized out>, 
    poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2482
#12 0x00007f0ac23ed870 in event_dispatch_epoll_handler (event=0x7f0ab4a69e70, event_pool=0x559b1ebc7250)
---Type <return> to continue, or q <return> to quit---
    at event-epoll.c:643
#13 event_dispatch_epoll_worker (data=0x559b1ec250a0) at event-epoll.c:759
#14 0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f0ab526b700 (LWP 4255)):
#0  0x00007f0ac11d14ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f0ac11ccdcb in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f0ac11ccc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f0ac23a1819 in inode_is_linked (inode=inode@entry=0x7f0a4c03b8f8) at inode.c:2490
#4  0x00007f0aafdd2693 in afr_read_subvol_select_by_policy (inode=inode@entry=0x7f0a4c03b8f8, 
    this=this@entry=0x7f097d715a70, readable=readable@entry=0x7f0ab526a420 "\001\001\001\265\n\177", 
    args=args@entry=0x0) at afr-common.c:1685
---Type <return> to continue, or q <return> to quit---
#5  0x00007f0aafdd29d6 in afr_read_subvol_get (inode=inode@entry=0x7f0a4c03b8f8, this=0x7f097d715a70, 
    subvol_p=subvol_p@entry=0x0, readables=readables@entry=0x0, event_p=event_p@entry=0x0, 
    type=type@entry=AFR_DATA_TRANSACTION, args=args@entry=0x0) at afr-common.c:1771
#6  0x00007f0aafde1050 in afr_get_parent_read_subvol (readable=0x7f0ab526a540 "\001\001\001|\t\177", 
    replies=<optimized out>, parent=0x7f0a4c03b8f8, this=<optimized out>) at afr-common.c:2167
#7  afr_lookup_done (frame=frame@entry=0x7f098664dcb8, this=this@entry=0x7f097d715a70) at afr-common.c:2319
#8  0x00007f0aafde2058 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7f098664dcb8, 
    this=this@entry=0x7f097d715a70) at afr-common.c:2771
#9  0x00007f0aafde2a5b in afr_lookup_entry_heal (frame=frame@entry=0x7f098664dcb8, this=this@entry=0x7f097d715a70)
    at afr-common.c:2920
#10 0x00007f0aafde2e3d in afr_lookup_cbk (frame=frame@entry=0x7f098664dcb8, cookie=<optimized out>, 
    this=0x7f097d715a70, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7f09850f0d98, 
    buf=buf@entry=0x7f0ab526a900, xdata=0x7f0993075e88, postparent=postparent@entry=0x7f0ab526a970)
---Type <return> to continue, or q <return> to quit---
    at afr-common.c:2968
#11 0x00007f0ab4032efd in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, 
    myframe=0x7f0985df1b58) at client-rpc-fops.c:2872
#12 0x00007f0ac2134a00 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f097c49ee60, pollin=pollin@entry=0x7f09937ef6c0)
    at rpc-clnt.c:778
#13 0x00007f0ac2134d6b in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f097c49ee90, event=<optimized out>, 
    data=0x7f09937ef6c0) at rpc-clnt.c:971
#14 0x00007f0ac2130ae3 in rpc_transport_notify (this=this@entry=0x7f097e3ea6a0, 
    event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f09937ef6c0) at rpc-transport.c:557
#15 0x00007f0ab6d22586 in socket_event_poll_in (this=this@entry=0x7f097e3ea6a0, notify_handled=<optimized out>)
    at socket.c:2322
#16 0x00007f0ab6d24bca in socket_event_handler (fd=29, idx=20, gen=7, data=0x7f097e3ea6a0, poll_in=<optimized out>, 
    poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2482
---Type <return> to continue, or q <return> to quit---
#17 0x00007f0ac23ed870 in event_dispatch_epoll_handler (event=0x7f0ab526ae70, event_pool=0x559b1ebc7250)
    at event-epoll.c:643
#18 event_dispatch_epoll_worker (data=0x559b1ec24dd0) at event-epoll.c:759
#19 0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f0ab7735700 (LWP 4252)):
#0  0x00007f0ac11ced12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f0ac23cadd8 in syncenv_task (proc=proc@entry=0x559b1ebe3200) at syncop.c:603
#2  0x00007f0ac23cbca0 in syncenv_processor (thdata=0x559b1ebe3200) at syncop.c:695
#3  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 3 (Thread 0x7f0aacaad700 (LWP 4261)):
#0  0x00007f0ac11ce965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f0ab97438d3 in notify_kernel_loop (data=<optimized out>) at fuse-bridge.c:4584
#2  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f0ab8f38700 (LWP 4249)):
#0  0x00007f0ac11d2361 in sigwait () from /lib64/libpthread.so.0
#1  0x0000559b1de36c7b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2242
#2  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f0aadaaf700 (LWP 4259)):
---Type <return> to continue, or q <return> to quit---
#0  __dentry_search_arbit (inode=inode@entry=0x7f0a5002eab8) at inode.c:1450
#1  0x00007f0ac23a01c8 in __inode_path (inode=inode@entry=0x7f0a50009e68, 
    name=name@entry=0x7f0a028f1b50 "fresh_top.log", bufp=bufp@entry=0x7f0985ba36e8)
    at inode.c:1551
#2  0x00007f0ac23a0551 in inode_path (inode=0x7f0a50009e68, 
    name=0x7f0a028f1b50 "fresh_top.log", bufp=bufp@entry=0x7f0985ba36e8)
    at inode.c:1642
#3  0x00007f0ab9740489 in fuse_resolve_entry (state=0x7f0985ba3570) at fuse-resolve.c:99
#4  0x00007f0ab974162d in fuse_resolve_parent (state=state@entry=0x7f0985ba3570) at fuse-resolve.c:312
#5  0x00007f0ab9741998 in fuse_resolve (state=0x7f0985ba3570) at fuse-resolve.c:647
#6  0x00007f0ab9741c8e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:679
#7  0x00007f0ab9741cd0 in fuse_resolve_and_resume (state=0x7f0985ba3570, fn=0x7f0ab9744e40 <fuse_lookup_resume>)
    at fuse-resolve.c:718
---Type <return> to continue, or q <return> to quit---
#8  0x00007f0ab9759702 in fuse_thread_proc (data=0x559b1ebcf0c0) at fuse-bridge.c:5781
#9  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f0ac0a92ead in clone () from /lib64/libc.so.6
(gdb) 


1. 3x3 volume
2. IOs triggered from 8 different clients both as root and non root user for about a week, with quotas/uss enabled and set after 2 days
3. after about a week, did a add-brick with 3 replica sets to make it 6x3 and triggered rebalance and left it over weekend



--- Additional comment from Amar Tumballi on 2019-03-04 11:49:51 UTC ---

Checking the core. My initial suspicion was on lru-limit, but doesn't look so.

---------------
(gdb) p name
$22 = 0x7f0a028f1b50 "fresh_top.log"
(gdb) p *inode
$23 = {table = 0x7f09971d46f0, gfid = "\353\034\020'DZG\266\271\034\240\031\205ׂL", lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, 
        __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
      __align = 0}}, nlookup = 0, fd_count = 0, active_fd_count = 0, ref = 3, ia_type = IA_IFDIR, fd_list = {next = 0x7f0a50009ec0, prev = 0x7f0a50009ec0}, 
  dentry_list = {next = 0x7f0a4c0386f8, prev = 0x7f0a4c0386f8}, hash = {next = 0x7f097f4971e0, prev = 0x7f097f4971e0}, list = {next = 0x7f0985631490, 
    prev = 0x7f098536bcf0}, _ctx = 0x7f0a50011380, invalidate_sent = _gf_false}

(gdb) p *inode->table
$24 = {lock = {__data = {__lock = 2, __count = 0, __owner = 4259, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = "\002\000\000\000\000\000\000\000\243\020\000\000\001", '\000' <repeats 26 times>, __align = 2}, hashsize = 14057, 
  name = 0x7f098fc5b040 "meta-autoload/inode", root = 0x7f097ca5a168, xl = 0x7f097e043300, lru_limit = 131072, inode_hash = 0x7f097f414d20, 
  name_hash = 0x7f097f514d70, active = {next = 0x7f098536bcf0, prev = 0x7f097ca5a1f0}, active_size = 26, lru = {next = 0x7f0985f9ccd0, prev = 0x7f09803d4020}, 
  lru_size = 70, purge = {next = 0x7f09971d4780, prev = 0x7f09971d4780}, purge_size = 0, inode_pool = 0x7f09971d4830, dentry_pool = 0x7f09971d48f0, 
  fd_mem_pool = 0x7f098df5eb80, ctxcount = 33, invalidator_fn = 0x7f0ab97425d0 <fuse_inode_invalidate_fn>, invalidator_xl = 0x559b1ebcf0c0, invalidate = {
    next = 0x7f09971d47c8, prev = 0x7f09971d47c8}, invalidate_size = 0}

(gdb) p *inode->table->root
$29 = {table = 0x7f09971d46f0, gfid = '\000' <repeats 15 times>, "\001", lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, 
        __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, 
  nlookup = 0, fd_count = 0, active_fd_count = 0, ref = 1, ia_type = IA_IFDIR, fd_list = {next = 0x7f097ca5a1c0, prev = 0x7f097ca5a1c0}, dentry_list = {
    next = 0x7f097ca5a1d0, prev = 0x7f097ca5a1d0}, hash = {next = 0x7f097f414d30, prev = 0x7f097f414d30}, list = {next = 0x7f09971d4750, prev = 0x7f0a50012840}, 
  _ctx = 0x7f097c72e1a0, invalidate_sent = _gf_false}

(gdb) p *((dentry_t *)inode->dentry_list)
$26 = {inode_list = {next = 0x7f0a50009ed0, prev = 0x7f0a50009ed0}, hash = {next = 0x7f097f536fc0, prev = 0x7f097f536fc0}, inode = 0x7f0a50009e68, 
  name = 0x7f0a4c00d340 "top.dir", parent = 0x7f0a5002eab8}

<CORRUPTED>
(gdb) p *((dentry_t *)inode->dentry_list)->parent
$27 = {table = 0x7f0a5004f6a8, gfid = "\000\000\000\000\000\000\000\000\310\352\002P\n\177\000", lock = {spinlock = 1342368456, mutex = {__data = {
        __lock = 1342368456, __count = 32522, __owner = 0, __nusers = 0, __kind = 515698880, __spins = 21915, __elision = 0, __list = {__prev = 0x0, 
          __next = 0x0}}, __size = "\310\352\002P\n\177\000\000\000\000\000\000\000\000\000\000\300\360\274\036\233U", '\000' <repeats 17 times>, 
      __align = 139682268768968}}, nlookup = 0, fd_count = 0, active_fd_count = 0, ref = 4294967295, ia_type = IA_INVAL, fd_list = {next = 0x0, prev = 0x0}, 
  dentry_list = {next = 0x0, prev = 0x100000000}, hash = {next = 0x0, prev = 0x0}, list = {next = 0x0, prev = 0x0}, _ctx = 0x0, invalidate_sent = _gf_false}


---------------

Notice that lru_size is no where close to lru-limit.

Also the inode by itself is fine. The issue is, while everything is under lock, a parent dentry looks to be gone, or rather freed. A possible case of extra unref() ??

Looking at the info, that this is 'top.dir' mostly it should have been linked to root inode. But dentry is freed. Will look more into this, and keep this updated. Also, looks like something which would have existed forever by the first look.

--- Additional comment from Amar Tumballi on 2019-03-04 12:47:00 UTC ---

> [2019-03-01 13:37:02.711187] W [inode.c:197:__is_dentry_hashed] (-->/lib64/libglusterfs.so.0(__inode_path+0x68) [0x7f515a5cd1c8] -->/lib64/libglusterfs.so.0(+0x3add4) [0x7f515a5cadd4] -->/lib64/libglusterfs.so.0(+0x3ad7e) [0x7f515a5cad7e] ) 0-fuse: dentry not found
> pending frames:
> frame : type(1) op(UNLINK)
> frame : type(1) op(UNLINK)
> frame : type(1) op(READDIRP)

----
(gdb) bt
#0  __dentry_search_arbit (inode=inode@entry=0x7f0a5002eab8) at inode.c:1450
(gdb) l
1445            dentry_t *trav = NULL;
1446
1447            if (!inode)
1448                    return NULL;
1449
1450            list_for_each_entry (trav, &inode->dentry_list, inode_list) {
1451                    if (__is_dentry_hashed (trav)) {
1452                            dentry = trav;
1453                            break;

----

Looks like we need to see if trav is null, and break the loop. Mainly here, __is_dentry_hashed() has given 0 output, and we still continue to traverse the list. I guess, that should have stopped.

Still checking.


--- Additional comment from Amar Tumballi on 2019-03-04 14:23:18 UTC ---

As per comment #4, 

> Also, looks like something which would have existed forever by the first look.

I suspect this to be a bug in code since a very long time. If in these lists, if dentry is NULL, by the execution, next iteration will definitely crash, which happened here. Need to traceback why this happened when every possible change operation in inode happens with table lock.

Comment 1 Worker Ant 2019-03-06 14:37:46 UTC
REVIEW: https://review.gluster.org/22311 (inode: check for instance of dentry null) posted (#1) for review on master by Amar Tumballi

Comment 2 Worker Ant 2019-03-07 10:53:43 UTC
REVIEW: https://review.gluster.org/22318 (inode: handle 'dentry_list' in destroy) posted (#1) for review on master by Amar Tumballi

Comment 3 Amar Tumballi 2019-07-15 08:56:14 UTC
Not seen in recent times. Recommend using glusterfs-6.x