Bug 1666969

Summary: [geo-rep]: Crashes on slave nodes
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rochelle <rallan>
Component: geo-replicationAssignee: Kotresh HR <khiremat>
Status: CLOSED CURRENTRELEASE QA Contact: Rahul Hinduja <rhinduja>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: csaba, moagrawa, nchilaka, pasik, rhs-bugs, sasundar, sheggodu, storage-qa-internal, sunkumar
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-6.0-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-17 13:25:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rochelle 2019-01-17 06:16:56 UTC
Description of problem:
=======================
Multiple cores on the slave after automation run :

[root@dhcp42-50 /]# gdb ./core.16047
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
[New LWP 16091]
[New LWP 16052]
[New LWP 16077]
[New LWP 16092]
[New LWP 16051]
[New LWP 16053]
[New LWP 16078]
[New LWP 16054]
[New LWP 16047]
[New LWP 16055]

warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

warning: the debug information found in "/usr/lib/debug//usr/lib64/glusterfs/3.12.2/xlator/cluster/dht.so.debug" does not match "/usr/lib64/glusterfs/3.12.2/xlator/cluster/distribute.so" (CRC mismatch).


warning: the debug information found in "/usr/lib/debug/usr/lib64/glusterfs/3.12.2/xlator/cluster/dht.so.debug" does not match "/usr/lib64/glusterfs/3.12.2/xlator/cluster/distribute.so" (CRC mismatch).

Core was generated by `/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007ff1ad4f31a7 in __inode_get_xl_index (xlator=0x7ff19802dc70, inode=0x7ff19430d968) at inode.c:455
455	        if ((inode->_ctx[xlator->xl_id].xl_key != NULL) &&
Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.12.2-37.el7rhgs.x86_64
(gdb) t a a bt

Thread 10 (Thread 0x7ff1a28af700 (LWP 16055)):
#0  0x00007ff1ac346d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007ff1ad51f7a8 in syncenv_task (proc=proc@entry=0x55dfa8556610) at syncop.c:603
#2  0x00007ff1ad520670 in syncenv_processor (thdata=0x55dfa8556610) at syncop.c:695
#3  0x00007ff1ac342dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ff1abc0aead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7ff1ad9c7780 (LWP 16047)):
#0  0x00007ff1ac343f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007ff1ad5424b8 in event_dispatch_epoll (event_pool=0x55dfa853a150) at event-epoll.c:746
#2  0x000055dfa7d523f2 in main (argc=8, argv=<optimized out>) at glusterfsd.c:2617

Thread 8 (Thread 0x7ff1a30b0700 (LWP 16054)):
#0  0x00007ff1ac346d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007ff1ad51f7a8 in syncenv_task (proc=proc@entry=0x55dfa8556250) at syncop.c:603
#2  0x00007ff1ad520670 in syncenv_processor (thdata=0x55dfa8556250) at syncop.c:695
#3  0x00007ff1ac342dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ff1abc0aead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7ff19fbfb700 (LWP 16078)):
#0  0x00007ff1abc0b483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007ff1ad541d52 in event_dispatch_epoll_worker (data=0x55dfa8594290) at event-epoll.c:649
#2  0x00007ff1ac342dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ff1abc0aead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7ff1a38b1700 (LWP 16053)):
#0  0x00007ff1abbd1e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007ff1abbd1cc4 in sleep () from /lib64/libc.so.6
#2  0x00007ff1ad50c5ad in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007ff1ac342dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ff1abc0aead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7ff1a48b3700 (LWP 16051)):
#0  0x00007ff1ac349e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007ff1ad4f1d56 in gf_timer_proc (data=0x55dfa8555a30) at timer.c:174
#2  0x00007ff1ac342dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ff1abc0aead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7ff18ffff700 (LWP 16092)):
#0  0x00007ff1ac346965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007ff1a48bc9b3 in notify_kernel_loop (data=<optimized out>) at fuse-bridge.c:4036
#2  0x00007ff1ac342dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ff1abc0aead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7ff1a03fc700 (LWP 16077)):
#0  0x00007ff1abc0b483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007ff1ad541d52 in event_dispatch_epoll_worker (data=0x55dfa8593fc0) at event-epoll.c:649
#2  0x00007ff1ac342dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ff1abc0aead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7ff1a40b2700 (LWP 16052)):
#0  0x00007ff1ac34a361 in sigwait () from /lib64/libpthread.so.0
#1  0x000055dfa7d5595b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2167
#2  0x00007ff1ac342dd5 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#3  0x00007ff1abc0aead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7ff19ca42700 (LWP 16091)):
#0  0x00007ff1ad4f31a7 in __inode_get_xl_index (xlator=0x7ff19802dc70, inode=0x7ff19430d968) at inode.c:455
#1  __inode_unref (inode=inode@entry=0x7ff19430d968) at inode.c:489
#2  0x00007ff1ad4f3a01 in inode_unref (inode=0x7ff19430d968) at inode.c:559
#3  0x00007ff19d9cb03a in ga_forget (this=<optimized out>, inode=<optimized out>) at gfid-access.c:367
#4  0x00007ff1ad4f259e in __inode_ctx_free (inode=inode@entry=0x7ff198517228) at inode.c:332
#5  0x00007ff1ad4f3732 in __inode_destroy (inode=0x7ff198517228) at inode.c:353
#6  inode_table_prune (table=table@entry=0x7ff1980a2b40) at inode.c:1579
#7  0x00007ff1ad4f3a14 in inode_unref (inode=0x7ff198517228) at inode.c:563
#8  0x00007ff1a48bbcc2 in do_forget (this=this@entry=0x55dfa8542290, unique=<optimized out>, nodeid=<optimized out>, nlookup=<optimized out>) at fuse-bridge.c:693
#9  0x00007ff1a48bbd5a in fuse_batch_forget (this=0x55dfa8542290, finh=0x7ff194365f00, msg=0x7ff194365f28, iobuf=<optimized out>) at fuse-bridge.c:733
#10 0x00007ff1a48d1c82 in fuse_thread_proc (data=0x55dfa8542290) at fuse-bridge.c:5134
#11 0x00007ff1ac342dd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ff1abc0aead in clone () from /lib64/libc.so.6
(gdb) 
(gdb) 





[root@dhcp42-43 /]# gdb ./core.28489
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
[New LWP 28522]
[New LWP 28489]
[New LWP 28523]
[New LWP 28512]
[New LWP 28500]
[New LWP 28511]
[New LWP 28498]
[New LWP 28490]
[New LWP 28496]
[New LWP 28494]

warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

warning: the debug information found in "/usr/lib/debug//usr/lib64/glusterfs/3.12.2/xlator/cluster/dht.so.debug" does not match "/usr/lib64/glusterfs/3.12.2/xlator/cluster/distribute.so" (CRC mismatch).


warning: the debug information found in "/usr/lib/debug/usr/lib64/glusterfs/3.12.2/xlator/cluster/dht.so.debug" does not match "/usr/lib64/glusterfs/3.12.2/xlator/cluster/distribute.so" (CRC mismatch).

Core was generated by `/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f08e57531a7 in __inode_get_xl_index (xlator=0x7f08c802dc70, inode=0x7f08cc14c818) at inode.c:455
455	        if ((inode->_ctx[xlator->xl_id].xl_key != NULL) &&
Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.12.2-37.el7rhgs.x86_64
(gdb) t a a bt

Thread 10 (Thread 0x7f08dc312700 (LWP 28494)):
#0  0x00007f08e45aa361 in sigwait () from /lib64/libpthread.so.0
#1  0x000056089091095b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2167
#2  0x00007f08e45a2dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f08e3e6aead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f08dbb11700 (LWP 28496)):
#0  0x00007f08e3e31e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f08e3e31cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f08e576c5ad in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007f08e45a2dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f08e3e6aead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f08dcb13700 (LWP 28490)):
#0  0x00007f08e45a9e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f08e5751d56 in gf_timer_proc (data=0x560892123a30) at timer.c:174
#2  0x00007f08e45a2dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f08e3e6aead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f08db310700 (LWP 28498)):
#0  0x00007f08e45a6d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f08e577f7a8 in syncenv_task (proc=proc@entry=0x560892124250) at syncop.c:603
#2  0x00007f08e5780670 in syncenv_processor (thdata=0x560892124250) at syncop.c:695
#3  0x00007f08e45a2dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f08e3e6aead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f08d865c700 (LWP 28511)):
#0  0x00007f08e3e6b483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f08e57a1d52 in event_dispatch_epoll_worker (data=0x560892161fc0) at event-epoll.c:649
#2  0x00007f08e45a2dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f08e3e6aead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f08dab0f700 (LWP 28500)):
#0  0x00007f08e45a6d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f08e577f7a8 in syncenv_task (proc=proc@entry=0x560892124610) at syncop.c:603
#2  0x00007f08e5780670 in syncenv_processor (thdata=0x560892124610) at syncop.c:695
#3  0x00007f08e45a2dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f08e3e6aead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f08d7e5b700 (LWP 28512)):
#0  0x00007f08e3e6b483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f08e57a1d52 in event_dispatch_epoll_worker (data=0x560892162290) at event-epoll.c:649
#2  0x00007f08e45a2dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f08e3e6aead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f08c7fff700 (LWP 28523)):
#0  0x00007f08e45a6965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f08dcb1c9b3 in notify_kernel_loop (data=<optimized out>) at fuse-bridge.c:4036
#2  0x00007f08e45a2dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f08e3e6aead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f08e5c27780 (LWP 28489)):
#0  0x00007f08e45a3f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f08e57a24b8 in event_dispatch_epoll (event_pool=0x560892108150) at event-epoll.c:746
---Type <return> to continue, or q <return> to quit---
#2  0x000056089090d3f2 in main (argc=8, argv=<optimized out>) at glusterfsd.c:2617

Thread 1 (Thread 0x7f08d4ca2700 (LWP 28522)):
#0  0x00007f08e57531a7 in __inode_get_xl_index (xlator=0x7f08c802dc70, inode=0x7f08cc14c818) at inode.c:455
#1  __inode_unref (inode=0x7f08cc14c818) at inode.c:489
#2  0x00007f08e57532f3 in __dentry_unset (dentry=0x7f08d00eea68) at inode.c:141
#3  0x00007f08e575351b in __inode_retire (inode=0x7f08cc153ed8) at inode.c:445
#4  0x00007f08e5753245 in __inode_unref (inode=inode@entry=0x7f08cc153ed8) at inode.c:501
#5  0x00007f08e5753a01 in inode_unref (inode=0x7f08cc153ed8) at inode.c:559
#6  0x00007f08d5c2b03a in ga_forget (this=<optimized out>, inode=<optimized out>) at gfid-access.c:367
#7  0x00007f08e575259e in __inode_ctx_free (inode=inode@entry=0x7f08c82c7478) at inode.c:332
#8  0x00007f08e5753732 in __inode_destroy (inode=0x7f08c82c7478) at inode.c:353
#9  inode_table_prune (table=table@entry=0x7f08c80ac980) at inode.c:1579
#10 0x00007f08e5753a14 in inode_unref (inode=0x7f08c82c7478) at inode.c:563
#11 0x00007f08dcb1bcc2 in do_forget (this=this@entry=0x560892110290, unique=<optimized out>, nodeid=<optimized out>, nlookup=<optimized out>) at fuse-bridge.c:693
#12 0x00007f08dcb1bd5a in fuse_batch_forget (this=0x560892110290, finh=0x7f08cc1be120, msg=0x7f08cc1be148, iobuf=<optimized out>) at fuse-bridge.c:733
#13 0x00007f08dcb31c82 in fuse_thread_proc (data=0x560892110290) at fuse-bridge.c:5134
#14 0x00007f08e45a2dd5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f08e3e6aead in clone () from /lib64/libc.so.6
(gdb) 




[root@dhcp42-40 /]# gdb ./core.17475
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
[New LWP 17506]
[New LWP 17507]
[New LWP 17478]
[New LWP 17479]
[New LWP 17493]
[New LWP 17494]
[New LWP 17482]
[New LWP 17481]
[New LWP 17477]
[New LWP 17475]

warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

warning: the debug information found in "/usr/lib/debug//usr/lib64/glusterfs/3.12.2/xlator/cluster/dht.so.debug" does not match "/usr/lib64/glusterfs/3.12.2/xlator/cluster/distribute.so" (CRC mismatch).


warning: the debug information found in "/usr/lib/debug/usr/lib64/glusterfs/3.12.2/xlator/cluster/dht.so.debug" does not match "/usr/lib64/glusterfs/3.12.2/xlator/cluster/distribute.so" (CRC mismatch).

Core was generated by `/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fe5844731a7 in __inode_get_xl_index (xlator=0x7fe56802dc70, inode=0x7fe5644938d8) at inode.c:455
455	        if ((inode->_ctx[xlator->xl_id].xl_key != NULL) &&
Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.12.2-37.el7rhgs.x86_64
(gdb) t a a bt

Thread 10 (Thread 0x7fe584947780 (LWP 17475)):
#0  0x00007fe5832c3f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007fe5844c24b8 in event_dispatch_epoll (event_pool=0x556817cc8150) at event-epoll.c:746
#2  0x0000556815aa33f2 in main (argc=8, argv=<optimized out>) at glusterfsd.c:2617

Thread 9 (Thread 0x7fe57b833700 (LWP 17477)):
#0  0x00007fe5832c9e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007fe584471d56 in gf_timer_proc (data=0x556817ce3a30) at timer.c:174
#2  0x00007fe5832c2dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fe582b8aead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7fe57a030700 (LWP 17481)):
#0  0x00007fe5832c6d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fe58449f7a8 in syncenv_task (proc=proc@entry=0x556817ce4250) at syncop.c:603
#2  0x00007fe5844a0670 in syncenv_processor (thdata=0x556817ce4250) at syncop.c:695
#3  0x00007fe5832c2dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fe582b8aead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7fe57982f700 (LWP 17482)):
#0  0x00007fe5832c6d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fe58449f7a8 in syncenv_task (proc=proc@entry=0x556817ce4610) at syncop.c:603
#2  0x00007fe5844a0670 in syncenv_processor (thdata=0x556817ce4610) at syncop.c:695
#3  0x00007fe5832c2dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fe582b8aead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7fe576b7b700 (LWP 17494)):
#0  0x00007fe582b8b483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007fe5844c1d52 in event_dispatch_epoll_worker (data=0x556817d22290) at event-epoll.c:649
#2  0x00007fe5832c2dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fe582b8aead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7fe57737c700 (LWP 17493)):
#0  0x00007fe582b8b483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007fe5844c1d52 in event_dispatch_epoll_worker (data=0x556817d21fc0) at event-epoll.c:649
#2  0x00007fe5832c2dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fe582b8aead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7fe57a831700 (LWP 17479)):
#0  0x00007fe582b51e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007fe582b51cc4 in sleep () from /lib64/libc.so.6
#2  0x00007fe58448c5ad in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007fe5832c2dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fe582b8aead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7fe57b032700 (LWP 17478)):
#0  0x00007fe5832ca361 in sigwait () from /lib64/libpthread.so.0
#1  0x0000556815aa695b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2167
#2  0x00007fe5832c2dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fe582b8aead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7fe56eefe700 (LWP 17507)):
#0  0x00007fe5832c6965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fe57b83c9b3 in notify_kernel_loop (data=<optimized out>) at fuse-bridge.c:4036
#2  0x00007fe5832c2dd5 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#3  0x00007fe582b8aead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fe56f6ff700 (LWP 17506)):
#0  0x00007fe5844731a7 in __inode_get_xl_index (xlator=0x7fe56802dc70, inode=0x7fe5644938d8) at inode.c:455
#1  __inode_unref (inode=inode@entry=0x7fe5644938d8) at inode.c:489
#2  0x00007fe584473a01 in inode_unref (inode=0x7fe5644938d8) at inode.c:559
#3  0x00007fe57494b03a in ga_forget (this=<optimized out>, inode=<optimized out>) at gfid-access.c:367
#4  0x00007fe58447259e in __inode_ctx_free (inode=inode@entry=0x7fe56827ad68) at inode.c:332
#5  0x00007fe584473732 in __inode_destroy (inode=0x7fe56827ad68) at inode.c:353
#6  inode_table_prune (table=table@entry=0x7fe5680a4b90) at inode.c:1579
#7  0x00007fe584473a14 in inode_unref (inode=0x7fe56827ad68) at inode.c:563
#8  0x00007fe57b83bcc2 in do_forget (this=this@entry=0x556817cd0290, unique=<optimized out>, nodeid=<optimized out>, nlookup=<optimized out>) at fuse-bridge.c:693
#9  0x00007fe57b83bd5a in fuse_batch_forget (this=0x556817cd0290, finh=0x7fe564e58a50, msg=0x7fe564e58a78, iobuf=<optimized out>) at fuse-bridge.c:733
#10 0x00007fe57b851c82 in fuse_thread_proc (data=0x556817cd0290) at fuse-bridge.c:5134
#11 0x00007fe5832c2dd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fe582b8aead in clone () from /lib64/libc.so.6
(gdb) 
(gdb) 





Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.12.2-37.el7rhgs.x86_64

How reproducible:
=================
1/1


Steps to Reproduce:
====================
Geo-rep automation cases 


Actual results:
==============
Multiple crashes seen on the slave

Expected results:
================
There should be no crash

Additional info:
=================
There was no functionality impact but since there were many cores, it causes a drop in performance and hence marking this as a blocker

Comment 3 Csaba Henk 2019-01-17 09:45:46 UTC
Hi Mohit, I'm asking you as the one who dealt with https://bugzilla.redhat.com/1644164 / authored https://review.gluster.org/21305, if is it feasible that the crashes we see in Thread 1 are because of the lack of the hardening you've made in referred work?

Comment 4 Csaba Henk 2019-01-17 10:10:39 UTC
Some more explanation about "the crashes we see in Thread 1": this might be because of improper synchronization of access to inodes. Mohit's referred work introduces stronger synchronization there and that's why it might be then a suitable fix. If it's feasible, the test could be repeated with a build that has his patch backported. (If needed I can provide it.)

A crash with similar stack trace can be observed in Bug 1664173.