Bug 1664529

Summary: [geo-rep]: Multiple crashes seen on the slave during automation run
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rochelle <rallan>
Component: distributeAssignee: Susant Kumar Palai <spalai>
Status: CLOSED ERRATA QA Contact: Rochelle <rallan>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.4CC: apaladug, atoborek, atumball, avishwan, csaba, rcyriac, rhs-bugs, sankarshan, sheggodu, spalai, srmukher, storage-qa-internal, sunkumar, vdas
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.4.z Batch Update 3Flags: srmukher: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-37 Doc Type: If docs needed, set a value
Doc Text:
Previously, during a lookup failure, the stat structure returned was NULL, dereferencing which lead to the crash. With this fix, there is no defer in the stat structure when lookup failed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-04 07:41:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1664647    
Bug Blocks:    

Description Rochelle 2019-01-09 04:39:42 UTC
Description of problem:
=======================
Multiple cores seen in slave after automation run (Rsync + Fuse)


[root@dhcp42-50 /]# gdb ./core.10926 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
[New LWP 10935]
[New LWP 10928]
[New LWP 10926]
[New LWP 10929]
[New LWP 10938]
[New LWP 10927]
[New LWP 10939]
[New LWP 10936]
[New LWP 10930]
[New LWP 10931]

warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/df/8f6bf69e976bf1266e476ea2e37cee06f10c1d
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re'.
Program terminated with signal 11, Segmentation fault.
#0  dht_rmdir_lookup_cbk (frame=0x7f3ef80d3128, cookie=0x7f3ef801e230, this=0x7f3ef8022600, op_ret=-1, op_errno=2, inode=<optimized out>, stbuf=0x0, xattr=0x0, parent=0x0) at dht-common.c:9843
9843	                gf_msg (this->name, GF_LOG_WARNING, op_errno,
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-34.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.16.2-13.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) t a a bt

Thread 10 (Thread 0x7f3f00fb1700 (LWP 10931)):
#0  0x00007f3f0aa48d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3f0bc217a8 in syncenv_task (proc=proc@entry=0x560dcd6b8610) at syncop.c:603
#2  0x00007f3f0bc22670 in syncenv_processor (thdata=0x560dcd6b8610) at syncop.c:695
#3  0x00007f3f0aa44dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f3f0a30cead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f3f017b2700 (LWP 10930)):
#0  0x00007f3f0aa48d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3f0bc217a8 in syncenv_task (proc=proc@entry=0x560dcd6b8250) at syncop.c:603
#2  0x00007f3f0bc22670 in syncenv_processor (thdata=0x560dcd6b8250) at syncop.c:695
#3  0x00007f3f0aa44dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f3f0a30cead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f3efe2fd700 (LWP 10936)):
#0  0x00007f3f0a30d483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f3f0bc43d52 in event_dispatch_epoll_worker (data=0x560dcd6f6290) at event-epoll.c:649
#2  0x00007f3f0aa44dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f3f0a30cead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f3ef68e6700 (LWP 10939)):
#0  0x00007f3f0aa48965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3f02fbe9b3 in notify_kernel_loop (data=<optimized out>) at fuse-bridge.c:4036
#2  0x00007f3f0aa44dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f3f0a30cead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f3f02fb5700 (LWP 10927)):
#0  0x00007f3f0aa4be3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f3f0bbf3d56 in gf_timer_proc (data=0x560dcd6b7a30) at timer.c:174
#2  0x00007f3f0aa44dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f3f0a30cead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f3ef70e7700 (LWP 10938)):
#0  0x00007f3f0a303960 in readv () from /lib64/libc.so.6
#1  0x00007f3f0bc0fcf5 in sys_readv (fd=<optimized out>, iov=<optimized out>, iovcnt=<optimized out>) at syscall.c:295
#2  0x00007f3f02fd3b65 in fuse_thread_proc (data=0x560dcd6a4290) at fuse-bridge.c:5036
#3  0x00007f3f0aa44dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f3f0a30cead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f3f01fb3700 (LWP 10929)):
#0  0x00007f3f0a2d3e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f3f0a2d3cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f3f0bc0e5ad in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007f3f0aa44dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f3f0a30cead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f3f0c0c9780 (LWP 10926)):
#0  0x00007f3f0aa45f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f3f0bc444b8 in event_dispatch_epoll (event_pool=0x560dcd69c150) at event-epoll.c:746
#2  0x0000560dcc9cc3f2 in main (argc=8, argv=<optimized out>) at glusterfsd.c:2617

Thread 2 (Thread 0x7f3f027b4700 (LWP 10928)):
#0  0x00007f3f0aa4c361 in sigwait () from /lib64/libpthread.so.0
#1  0x0000560dcc9cf95b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2167
---Type <return> to continue, or q <return> to quit---
#2  0x00007f3f0aa44dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f3f0a30cead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f3efeafe700 (LWP 10935)):
#0  dht_rmdir_lookup_cbk (frame=0x7f3ef80d3128, cookie=0x7f3ef801e230, this=0x7f3ef8022600, op_ret=-1, op_errno=2, inode=<optimized out>, stbuf=0x0, xattr=0x0, parent=0x0) at dht-common.c:9843
#1  0x00007f3efd68428d in afr_lookup_done (frame=frame@entry=0x7f3ef80cb308, this=this@entry=0x7f3ef801e230) at afr-common.c:2466
#2  0x00007f3efd685058 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7f3ef80cb308, this=this@entry=0x7f3ef801e230) at afr-common.c:2771
#3  0x00007f3efd685a5b in afr_lookup_entry_heal (frame=frame@entry=0x7f3ef80cb308, this=this@entry=0x7f3ef801e230) at afr-common.c:2920
#4  0x00007f3efd685e3d in afr_lookup_cbk (frame=frame@entry=0x7f3ef80cb308, cookie=<optimized out>, this=0x7f3ef801e230, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7f3ef809cf28, 
    buf=buf@entry=0x7f3efeafd920, xdata=0x7f3ef80aeda8, postparent=postparent@entry=0x7f3efeafd990) at afr-common.c:2968
#5  0x00007f3efd8c5efd in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f3ef80b20f8) at client-rpc-fops.c:2872
#6  0x00007f3f0b9adb30 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f3ef80932c0, pollin=pollin@entry=0x7f3ef810d440) at rpc-clnt.c:778
#7  0x00007f3f0b9aded3 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f3ef80932f0, event=<optimized out>, data=0x7f3ef810d440) at rpc-clnt.c:971
#8  0x00007f3f0b9a9c33 in rpc_transport_notify (this=this@entry=0x7f3ef8093610, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f3ef810d440) at rpc-transport.c:552
#9  0x00007f3f0059e576 in socket_event_poll_in (this=this@entry=0x7f3ef8093610, notify_handled=<optimized out>) at socket.c:2322
#10 0x00007f3f005a0b1c in socket_event_handler (fd=13, idx=7, gen=1, data=0x7f3ef8093610, poll_in=1, poll_out=0, poll_err=0) at socket.c:2474
#11 0x00007f3f0bc43e84 in event_dispatch_epoll_handler (event=0x7f3efeafde80, event_pool=0x560dcd69c150) at event-epoll.c:583
#12 event_dispatch_epoll_worker (data=0x560dcd6f5fc0) at event-epoll.c:659
#13 0x00007f3f0aa44dd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f3f0a30cead in clone () from /lib64/libc.so.6
(gdb) 
(gdb) 








[root@dhcp42-50 /]# gdb ./core.13418
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
[New LWP 13431]
[New LWP 13420]
[New LWP 13432]
[New LWP 13423]
[New LWP 13419]
[New LWP 13435]
[New LWP 13421]
[New LWP 13422]
[New LWP 13436]
[New LWP 13418]

warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/df/8f6bf69e976bf1266e476ea2e37cee06f10c1d
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re'.
Program terminated with signal 11, Segmentation fault.
#0  dht_rmdir_lookup_cbk (frame=0x7fecd0021488, cookie=0x7fecd801fd00, this=0x7fecd8022600, op_ret=-1, op_errno=2, inode=<optimized out>, stbuf=0x0, xattr=0x0, parent=0x0) at dht-common.c:9843
9843	                gf_msg (this->name, GF_LOG_WARNING, op_errno,
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-34.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.16.2-13.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) t a a bt

Thread 10 (Thread 0x7fecec243780 (LWP 13418)):
#0  0x00007feceabbff47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007fecebdbe4b8 in event_dispatch_epoll (event_pool=0x55627f899150) at event-epoll.c:746
#2  0x000055627f0293f2 in main (argc=8, argv=<optimized out>) at glusterfsd.c:2617

Thread 9 (Thread 0x7fecd6a8c700 (LWP 13436)):
#0  0x00007feceabc2965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fece31389b3 in notify_kernel_loop (data=<optimized out>) at fuse-bridge.c:4036
#2  0x00007feceabbedd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fecea486ead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7fece192c700 (LWP 13422)):
#0  0x00007feceabc2d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fecebd9b7a8 in syncenv_task (proc=proc@entry=0x55627f8b5250) at syncop.c:603
#2  0x00007fecebd9c670 in syncenv_processor (thdata=0x55627f8b5250) at syncop.c:695
#3  0x00007feceabbedd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fecea486ead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7fece212d700 (LWP 13421)):
#0  0x00007fecea44de2d in nanosleep () from /lib64/libc.so.6
#1  0x00007fecea44dcc4 in sleep () from /lib64/libc.so.6
#2  0x00007fecebd885ad in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007feceabbedd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fecea486ead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7fecd728d700 (LWP 13435)):
#0  0x00007fecea47d960 in readv () from /lib64/libc.so.6
#1  0x00007fecebd89cf5 in sys_readv (fd=<optimized out>, iov=<optimized out>, iovcnt=<optimized out>) at syscall.c:295
#2  0x00007fece314db65 in fuse_thread_proc (data=0x55627f8a1290) at fuse-bridge.c:5036
#3  0x00007feceabbedd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fecea486ead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7fece312f700 (LWP 13419)):
#0  0x00007feceabc5e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007fecebd6dd56 in gf_timer_proc (data=0x55627f8b4a30) at timer.c:174
#2  0x00007feceabbedd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fecea486ead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7fece112b700 (LWP 13423)):
#0  0x00007feceabc2d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fecebd9b7a8 in syncenv_task (proc=proc@entry=0x55627f8b5610) at syncop.c:603
#2  0x00007fecebd9c670 in syncenv_processor (thdata=0x55627f8b5610) at syncop.c:695
#3  0x00007feceabbedd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fecea486ead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7fecde477700 (LWP 13432)):
#0  dht_rmdir_lookup_cbk (frame=0x7fecd80c4548, cookie=0x7fecd801e230, this=0x7fecd8022600, op_ret=-1, op_errno=2, inode=<optimized out>, stbuf=0x0, xattr=0x0, parent=0x0) at dht-common.c:9843
#1  0x00007fecdd7fe28d in afr_lookup_done (frame=frame@entry=0x7fecd002f0c8, this=this@entry=0x7fecd801e230) at afr-common.c:2466
#2  0x00007fecdd7ff058 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7fecd002f0c8, this=this@entry=0x7fecd801e230) at afr-common.c:2771
#3  0x00007fecdd7ffa5b in afr_lookup_entry_heal (frame=frame@entry=0x7fecd002f0c8, this=this@entry=0x7fecd801e230) at afr-common.c:2920
#4  0x00007fecdd7ffe3d in afr_lookup_cbk (frame=frame@entry=0x7fecd002f0c8, cookie=<optimized out>, this=0x7fecd801e230, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7fecd80c4998, 
    buf=buf@entry=0x7fecde476920, xdata=0x7fecd000ff68, postparent=postparent@entry=0x7fecde476990) at afr-common.c:2968
#5  0x00007fecdda3fefd in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7fecd0021368) at client-rpc-fops.c:2872
#6  0x00007fecebb27b30 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fecd8097920, pollin=pollin@entry=0x7fecd002c8f0) at rpc-clnt.c:778
---Type <return> to continue, or q <return> to quit---
#7  0x00007fecebb27ed3 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7fecd8097950, event=<optimized out>, data=0x7fecd002c8f0) at rpc-clnt.c:971
#8  0x00007fecebb23c33 in rpc_transport_notify (this=this@entry=0x7fecd8097c70, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fecd002c8f0) at rpc-transport.c:552
#9  0x00007fece0718576 in socket_event_poll_in (this=this@entry=0x7fecd8097c70, notify_handled=<optimized out>) at socket.c:2322
#10 0x00007fece071ab1c in socket_event_handler (fd=15, idx=4, gen=1, data=0x7fecd8097c70, poll_in=1, poll_out=0, poll_err=0) at socket.c:2474
#11 0x00007fecebdbde84 in event_dispatch_epoll_handler (event=0x7fecde476e80, event_pool=0x55627f899150) at event-epoll.c:583
#12 event_dispatch_epoll_worker (data=0x55627f8f3290) at event-epoll.c:659
#13 0x00007feceabbedd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007fecea486ead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7fece292e700 (LWP 13420)):
#0  0x00007feceabc6361 in sigwait () from /lib64/libpthread.so.0
#1  0x000055627f02c95b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2167
#2  0x00007feceabbedd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fecea486ead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fecdec78700 (LWP 13431)):
#0  dht_rmdir_lookup_cbk (frame=0x7fecd0021488, cookie=0x7fecd801fd00, this=0x7fecd8022600, op_ret=-1, op_errno=2, inode=<optimized out>, stbuf=0x0, xattr=0x0, parent=0x0) at dht-common.c:9843
#1  0x00007fecdd7fe28d in afr_lookup_done (frame=frame@entry=0x7fecd002f478, this=this@entry=0x7fecd801fd00) at afr-common.c:2466
#2  0x00007fecdd7ff058 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7fecd002f478, this=this@entry=0x7fecd801fd00) at afr-common.c:2771
#3  0x00007fecdd7ffa5b in afr_lookup_entry_heal (frame=frame@entry=0x7fecd002f478, this=this@entry=0x7fecd801fd00) at afr-common.c:2920
#4  0x00007fecdd7ffe3d in afr_lookup_cbk (frame=frame@entry=0x7fecd002f478, cookie=<optimized out>, this=0x7fecd801fd00, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7fecd002ea68, 
    buf=buf@entry=0x7fecdec77920, xdata=0x7fecd80c71b8, postparent=postparent@entry=0x7fecdec77990) at afr-common.c:2968
#5  0x00007fecdda3fefd in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7fecd002efb8) at client-rpc-fops.c:2872
#6  0x00007fecebb27b30 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fecd808a600, pollin=pollin@entry=0x7fecd80c5c70) at rpc-clnt.c:778
#7  0x00007fecebb27ed3 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7fecd808a630, event=<optimized out>, data=0x7fecd80c5c70) at rpc-clnt.c:971
#8  0x00007fecebb23c33 in rpc_transport_notify (this=this@entry=0x7fecd808a950, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fecd80c5c70) at rpc-transport.c:552
#9  0x00007fece0718576 in socket_event_poll_in (this=this@entry=0x7fecd808a950, notify_handled=<optimized out>) at socket.c:2322
#10 0x00007fece071ab1c in socket_event_handler (fd=19, idx=8, gen=1, data=0x7fecd808a950, poll_in=1, poll_out=0, poll_err=0) at socket.c:2474
#11 0x00007fecebdbde84 in event_dispatch_epoll_handler (event=0x7fecdec77e80, event_pool=0x55627f899150) at event-epoll.c:583
#12 event_dispatch_epoll_worker (data=0x55627f8f2fc0) at event-epoll.c:659
#13 0x00007feceabbedd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007fecea486ead in clone () from /lib64/libc.so.6
(gdb) 
(gdb) 









[root@dhcp42-50 /]# gdb ./core.7322
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
[New LWP 7327]
[New LWP 7322]
[New LWP 7324]
[New LWP 7335]
[New LWP 7334]
[New LWP 7323]
[New LWP 7331]
[New LWP 7325]
[New LWP 7326]
[New LWP 7332]

warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/df/8f6bf69e976bf1266e476ea2e37cee06f10c1d
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re'.
Program terminated with signal 11, Segmentation fault.
#0  dht_rmdir_lookup_cbk (frame=0x7f5c4c01b8d8, cookie=0x7f5c54021120, this=0x7f5c54022600, op_ret=-1, op_errno=2, inode=<optimized out>, stbuf=0x0, xattr=0x0, parent=0x0) at dht-common.c:9843
9843	                gf_msg (this->name, GF_LOG_WARNING, op_errno,
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-34.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.16.2-13.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) t a a bt

Thread 10 (Thread 0x7f5c53fff700 (LWP 7332)):
#0  0x00007f5c66410483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f5c67d46d52 in event_dispatch_epoll_worker (data=0x557dc24de290) at event-epoll.c:649
#2  0x00007f5c66b47dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5c6640fead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f5c5d8b5700 (LWP 7326)):
#0  0x00007f5c66b4bd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5c67d247a8 in syncenv_task (proc=proc@entry=0x557dc24a0250) at syncop.c:603
#2  0x00007f5c67d25670 in syncenv_processor (thdata=0x557dc24a0250) at syncop.c:695
#3  0x00007f5c66b47dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5c6640fead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f5c5e0b6700 (LWP 7325)):
#0  0x00007f5c663d6e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f5c663d6cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f5c67d115ad in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007f5c66b47dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5c6640fead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f5c5ac01700 (LWP 7331)):
#0  0x00007f5c66410483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f5c67d46d52 in event_dispatch_epoll_worker (data=0x557dc24ddfc0) at event-epoll.c:649
#2  0x00007f5c66b47dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5c6640fead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f5c5f0b8700 (LWP 7323)):
#0  0x00007f5c66b4ee3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f5c67cf6d56 in gf_timer_proc (data=0x557dc249fa30) at timer.c:174
#2  0x00007f5c66b47dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5c6640fead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f5c52efe700 (LWP 7334)):
#0  0x00007f5c66406960 in readv () from /lib64/libc.so.6
#1  0x00007f5c67d12cf5 in sys_readv (fd=<optimized out>, iov=<optimized out>, iovcnt=<optimized out>) at syscall.c:295
#2  0x00007f5c5f0d6b65 in fuse_thread_proc (data=0x557dc248c290) at fuse-bridge.c:5036
#3  0x00007f5c66b47dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5c6640fead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f5c526fd700 (LWP 7335)):
#0  0x00007f5c66b4b965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5c5f0c19b3 in notify_kernel_loop (data=<optimized out>) at fuse-bridge.c:4036
#2  0x00007f5c66b47dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5c6640fead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f5c5e8b7700 (LWP 7324)):
#0  0x00007f5c66b4f361 in sigwait () from /lib64/libpthread.so.0
#1  0x0000557dc10c195b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2167
#2  0x00007f5c66b47dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5c6640fead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f5c681cc780 (LWP 7322)):
#0  0x00007f5c66b48f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f5c67d474b8 in event_dispatch_epoll (event_pool=0x557dc2484150) at event-epoll.c:746
---Type <return> to continue, or q <return> to quit---
#2  0x0000557dc10be3f2 in main (argc=8, argv=<optimized out>) at glusterfsd.c:2617

Thread 1 (Thread 0x7f5c5d0b4700 (LWP 7327)):
#0  dht_rmdir_lookup_cbk (frame=0x7f5c4c01b8d8, cookie=0x7f5c54021120, this=0x7f5c54022600, op_ret=-1, op_errno=2, inode=<optimized out>, stbuf=0x0, xattr=0x0, parent=0x0) at dht-common.c:9843
#1  0x00007f5c59f8828d in afr_lookup_done (frame=frame@entry=0x7f5c4c004648, this=this@entry=0x7f5c54021120) at afr-common.c:2466
#2  0x00007f5c59f89058 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7f5c4c004648, this=this@entry=0x7f5c54021120) at afr-common.c:2771
#3  0x00007f5c59f893a3 in afr_lookup_selfheal_wrap (opaque=0x7f5c4c004648) at afr-common.c:2804
#4  0x00007f5c67d221d0 in synctask_wrap () at syncop.c:375
#5  0x00007f5c6635a010 in ?? () from /lib64/libc.so.6
#6  0x0000000000000000 in ?? ()
(gdb) 
(gdb) 






There are a number of the following Tracebacks are seen on the slave:
-----------------------------------------------------------------------

[2019-01-08 10:27:07.193919] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 643, in entry_ops
    st = lstat(slink)
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 577, in lstat
    return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE, EBUSY])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 559, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/00442a86-1c3b-499a-80b4-40f46b94a5dc'


[2019-01-08 12:27:38.893733] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 602, in entry_ops
    er = entry_purge(op, entry, gfid, e, uid, gid)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 485, in entry_purge
    ENOTEMPTY], [EBUSY])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 559, in errno_wrap
    return call(*arg)
OSError: [Errno 103] Software caused connection abort: '.gfid/8d4a59e9-8bf2-4457-ba4f-113b7f8670ab/level06'






From the Master:
=================
This happens at the same time: 
[2019-01-08 10:27:07.208045] E [repce(/bricks/brick1/master_brick3):209:__call__] RepceClient: call failed      call=21944:140042561623872:1546943203.24        method=entry_ops        error=OSError
[2019-01-08 10:27:07.209048] E [syncdutils(/bricks/brick1/master_brick3):349:log_raise_exception] <top>: Gluster Mount process exited   error=ENOTCONN
[2019-01-08 10:27:07.258683] I [syncdutils(/bricks/brick1/master_brick3):295:finalize] <top>: exiting.
[2019-01-08 10:27:07.270337] I [repce(/bricks/brick1/master_brick3):92:service_loop] RepceServer: terminating on reaching EOF.
[2019-01-08 10:27:07.272001] I [syncdutils(/bricks/brick1/master_brick3):295:finalize] <top>: exiting.
[2019-01-08 10:27:07.289017] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2019-01-08 10:27:17.958565] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Initializing...
[2019-01-08 10:27:17.959784] I [monitor(monitor):199:monitor] Monitor: starting gsyncd worker   brick=/bricks/brick1/master_brick3      slave_node=ssh://root.42.43:gluster://localhost:slave




[2019-01-08 12:27:31.529632] I [master(/bricks/brick0/master_brick0):1474:crawl] _GMaster: slave's time stime=(1546950432, 0)
[2019-01-08 12:27:38.284218] I [master(/bricks/brick0/master_brick0):1387:process] _GMaster: Entry Time Taken   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=29  CRE=0   duration=4.8141 UNL=79
[2019-01-08 12:27:38.285001] I [master(/bricks/brick0/master_brick0):1397:process] _GMaster: Data/Metadata Time Taken   SETA=0  SETX=0  meta_duration=0.0000    data_duration=0.1122    DATA=0  XATT=0
[2019-01-08 12:27:38.286060] I [master(/bricks/brick0/master_brick0):1407:process] _GMaster: Batch Completed    changelog_end=1546950448        entry_stime=(1546950447, 0)     changelog_start=1546950448      stime=(1546950447, 0)   duration=6.7381 num_changelogs=1        mode=live_changelog
[2019-01-08 12:27:38.907108] E [repce(/bricks/brick1/master_brick3):209:__call__] RepceClient: call failed      call=22796:140700824278848:1546950449.35        method=entry_ops        error=OSError
[2019-01-08 12:27:38.907922] E [syncdutils(/bricks/brick1/master_brick3):349:log_raise_exception] <top>: Gluster Mount process exited   error=ECONNABORTED
[2019-01-08 12:27:38.959705] I [syncdutils(/bricks/brick1/master_brick3):295:finalize] <top>: exiting.
[2019-01-08 12:27:38.976322] I [repce(/bricks/brick1/master_brick3):92:service_loop] RepceServer: terminating on reaching EOF.
[2019-01-08 12:27:38.977650] I [syncdutils(/bricks/brick1/master_brick3):295:finalize] <top>: exiting.
[2019-01-08 12:27:39.787256] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
[2019-01-08 12:27:44.324415] I [master(/bricks/brick0/master_brick0):1474:crawl] _GMaster: slave's time stime=(1546950447, 0)
[2019-01-08 12:27:50.496303] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Initializing...
[2019-01-08 12:27:50.497453] I [monitor(monitor):199:monitor] Monitor: starting gsyncd worker   brick=/bricks/brick1/master_brick3      slave_node=ssh://root.42.43:gluster://localhost:slave






Version-Release number of selected component (if applicable):
=============================================
glusterfs-3.12.2-36.el7rhgs.x86_64

How reproducible:
================
1/1

Steps to Reproduce:
==================
Ran geo-rep automation

Actual results:
===============
Multiple crashes seen on the slave


Expected results:
=================
There should be no crash


Additional info:
================
All the cases passed and hence not marking this as a blocker for now. 

Will attach sosreports and include all cores seen in all 3 systems.

The Worker did go to faulty but recovered:

[root@dhcp42-56 /]# gluster v geo-replication status
 
MASTER NODE    MASTER VOL    MASTER BRICK                    SLAVE USER    SLAVE                       SLAVE NODE     STATUS     CRAWL STATUS       LAST_SYNCED                  
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.56    master        /bricks/brick0/master_brick0    root          ssh://10.70.42.50::slave    10.70.42.43    Active     Changelog Crawl    2019-01-08 23:36:11          
10.70.42.56    master        /bricks/brick1/master_brick3    root          ssh://10.70.42.50::slave    10.70.42.43    Passive    N/A                N/A                          
10.70.42.56    master        /bricks/brick2/master_brick6    root          ssh://10.70.42.50::slave    10.70.42.43    Active     Changelog Crawl    2019-01-08 23:36:16          
10.70.42.53    master        /bricks/brick0/master_brick2    root          ssh://10.70.42.50::slave    10.70.42.40    Passive    N/A                N/A                          
10.70.42.53    master        /bricks/brick1/master_brick5    root          ssh://10.70.42.50::slave    10.70.42.40    Active     Changelog Crawl    2019-01-08 23:36:11          
10.70.42.53    master        /bricks/brick2/master_brick8    root          ssh://10.70.42.50::slave    10.70.42.40    Passive    N/A                N/A                          
10.70.42.52    master        /bricks/brick0/master_brick1    root          ssh://10.70.42.50::slave    10.70.42.50    Passive    N/A                N/A                          
10.70.42.52    master        /bricks/brick1/master_brick4    root          ssh://10.70.42.50::slave    10.70.42.50    Passive    N/A                N/A                          
10.70.42.52    master        /bricks/brick2/master_brick7    root          ssh://10.70.42.50::slave    10.70.42.50    Passive    N/A                N/A

Comment 5 Amar Tumballi 2019-01-10 05:06:53 UTC
looking at the crash backtrace, the fix is mostly https://review.gluster.org/22004

Comment 17 errata-xmlrpc 2019-02-04 07:41:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0263

Comment 18 Nithya Balachandran 2019-10-11 02:31:27 UTC
*** Bug 1760581 has been marked as a duplicate of this bug. ***