Bug 1238067
Summary: | Glusterd crashed while glusterd service was shutting down | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Bhaskarakiran <byarlaga> | ||||
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Bala Konda Reddy M <bmekala> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rhgs-3.1 | CC: | abhaumik, amukherj, byarlaga, mlawrenc, mzywusko, nbalacha, nchilaka, nlevinki, nsathyan, rmekala, sanandpa, sasundar, tdesala, vbellur, vdas | ||||
Target Milestone: | --- | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | GlusterD | ||||||
Fixed In Version: | Doc Type: | Known Issue | |||||
Doc Text: |
In rare instances, glusterd may crash when it is stopped. The crash is due to a race between the clean up thread and the running thread and doesn't impact functionality. The clean up thread releases URCU resources while a running thread continues to try to access it, which results in a crash.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1239156 (view as bug list) | Environment: | |||||
Last Closed: | 2016-01-08 08:58:25 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1216951, 1223636, 1239156, 1277939 | ||||||
Attachments: |
|
Description
Bhaskarakiran
2015-07-01 06:43:52 UTC
Created attachment 1044920 [details]
core file
time of crash : -rw-------. 1 root root 232M Jun 30 16:14 core.2913.1435661084.dump sosrepot : rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1238067/sosreport-sysreg-prod-20150701140725.tar.xz Upstream patch: http://review.gluster.org/#/c/11532/ *** Bug 1283139 has been marked as a duplicate of this bug. *** This one of the rare race at the clean up part where the URCU resources were already cleaned up by clean up thread and other thread was still accessing the resource. Since the current implementation doesn't take care of synchronizing the threads in respect to clean up, that's why its been observed. To fix this issue we'd need changes in sync-op framework which is non-trivial. As this doesn't impact the functionality and one of the rarest race to hit, we are planning not to chase down the bug and hence closing it. Feel free to reopen if you think otherwise with proper justification. *** Bug 1397669 has been marked as a duplicate of this bug. *** *** Bug 1434047 has been marked as a duplicate of this bug. *** *** Bug 1442928 has been marked as a duplicate of this bug. *** *** Bug 1530936 has been marked as a duplicate of this bug. *** *** Bug 1545045 has been marked as a duplicate of this bug. *** I am seeing below crash consistently on all nodes when upgrading from 3.8.4-54.8 to 3.12.2-9 Below is BT Atin, kindly confirm if this is the same (I see this crash on yum update glusterfs-server) [root@dhcp37-41 ~]# file /core.17780 /core.17780: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'glusterd --xlator-option *.upgrade=on -N', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/sbin/glusterd', platform: 'x86_64' warning: core file may not match specified executable file. Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done. done. Missing separate debuginfo for Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/3b/b87246fcddff47293950c06e763e44f866502e [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `glusterd --xlator-option *.upgrade=on -N'. Program terminated with signal 11, Segmentation fault. #0 0x00007f4f63944d8b in rcu_bp_register () from /lib64/liburcu-bp.so.1 Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 device-mapper-event-libs-1.02.146-4.el7.x86_64 device-mapper-libs-1.02.146-4.el7.x86_64 elfutils-libelf-0.170-4.el7.x86_64 elfutils-libs-0.170-4.el7.x86_64 glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-52.el7.x86_64 libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-12.el7.x86_64 libsepol-2.5-8.1.el7.x86_64 libuuid-2.23.2-52.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 lvm2-libs-2.02.177-4.el7.x86_64 openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64 systemd-libs-219-57.el7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 xz-libs-5.2.2-1.el7.x86_64 zalib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x00007f4f63944d8b in rcu_bp_register () from /lib64/liburcu-bp.so.1 #1 0x00007f4f639450ce in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1 #2 0x00007f4f63ee087c in __glusterd_peer_rpc_notify (rpc=rpc@entry=0x55d7715674a0, mydata=mydata@entry=0x55d771565e10, event=event@entry=RPC_CLNT_CONNECT, data=data@entry=0x0) at glusterd-handler.c:6372 #3 0x00007f4f63ed6a5a in glusterd_big_locked_notify (rpc=0x55d7715674a0, mydata=0x55d771565e10, event=RPC_CLNT_CONNECT, data=0x0, notify_fn=0x7f4f63ee0830 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:70 #4 0x00007f4f6f215594 in rpc_clnt_notify (trans=<optimized out>, mydata=0x55d7715674d0, event=<optimized out>, data=0x55d7715676d0) at rpc-clnt.c:1004 #5 0x00007f4f6f211393 in rpc_transport_notify (this=this@entry=0x55d7715676d0, event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x55d7715676d0) at rpc-transport.c:538 #6 0x00007f4f6111e367 in socket_connect_finish (this=this@entry=0x55d7715676d0) at socket.c:2404 #7 0x00007f4f61122aa8 in socket_event_handler (fd=11, idx=2, gen=1, data=0x55d7715676d0, poll_in=0, poll_out=4, poll_err=0) at socket.c:2456 #8 0x00007f4f6f4aae34 in event_dispatch_epoll_handler (event=0x7f4f5f173e80, event_pool=0x55d7714a7210) at event-epoll.c:583 #9 event_dispatch_epoll_worker (data=0x55d771571f10) at event-epoll.c:659 ---Type <return> to continue, or q <return> to quit--- #10 0x00007f4f6e2abdd5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f4f6db74b3d in clone () from /lib64/libc.so.6 (gdb) t a a bt Thread 8 (Thread 0x7f4f6624b700 (LWP 17782)): #0 0x00007f4f613795b0 in _fini () from /lib64/libpcre.so.1 #1 0x00007f4f6f7321a8 in _dl_fini () from /lib64/ld-linux-x86-64.so.2 #2 0x00007f4f6daafb69 in __run_exit_handlers () from /lib64/libc.so.6 #3 0x00007f4f6daafbb7 in exit () from /lib64/libc.so.6 #4 0x000055d76fe4c4df in cleanup_and_exit (signum=15) at glusterfsd.c:1423 #5 0x000055d76fe4c5d5 in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2145 #6 0x00007f4f6e2abdd5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f4f6db74b3d in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f4f6f935780 (LWP 17780)): #0 0x00007f4f6e2acf47 in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f4f6f4ab468 in event_dispatch_epoll (event_pool=0x55d7714a7210) at event-epoll.c:746 #2 0x000055d76fe492a7 in main (argc=4, argv=<optimized out>) at glusterfsd.c:2550 Thread 6 (Thread 0x7f4f65249700 (LWP 17784)): #0 0x00007f4f6e2afcf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #1 0x00007f4f6f489008 in syncenv_task (proc=proc@entry=0x55d7714af0e0) at syncop.c:603 #2 0x00007f4f6f489ed0 in syncenv_processor (thdata=0x55d7714af0e0) at syncop.c:695 #3 0x00007f4f6e2abdd5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f4f6db74b3d in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f4f66a4c700 (LWP 17781)): #0 0x00007f4f6e2b2eed in nanosleep () from /lib64/libpthread.so.0 #1 0x00007f4f6f45b986 in gf_timer_proc (data=0x55d7714ae8c0) at timer.c:174 #2 0x00007f4f6e2abdd5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f4f6db74b3d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f4f64a48700 (LWP 17785)): #0 0x00007f4f6e2afcf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f4f6f489008 in syncenv_task (proc=proc@entry=0x55d7714af4a0) at syncop.c:603 #2 0x00007f4f6f489ed0 in syncenv_processor (thdata=0x55d7714af4a0) at syncop.c:695 #3 0x00007f4f6e2abdd5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f4f6db74b3d in clone () from /lib64/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Thread 3 (Thread 0x7f4f65a4a700 (LWP 17783)): #0 0x00007f4f6db3b4fd in nanosleep () from /lib64/libc.so.6 #1 0x00007f4f6db3b394 in sleep () from /lib64/libc.so.6 #2 0x00007f4f6f4761bd in pool_sweeper (arg=<optimized out>) at mem-pool.c:481 #3 0x00007f4f6e2abdd5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f4f6db74b3d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f4f5f975700 (LWP 17786)): #0 0x00007f4f6e2af945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f4f63f9602b in hooks_worker (args=<optimized out>) at glusterd-hooks.c:529 #2 0x00007f4f6e2abdd5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f4f6db74b3d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f4f5f174700 (LWP 17787)): #0 0x00007f4f63944d8b in rcu_bp_register () from /lib64/liburcu-bp.so.1 #1 0x00007f4f639450ce in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1 #2 0x00007f4f63ee087c in __glusterd_peer_rpc_notify (rpc=rpc@entry=0x55d7715674a0, mydata=mydata@entry=0x55d771565e10, event=event@entry=RPC_CLNT_CONNECT, data=data@entry=0x0) ---Type <return> to continue, or q <return> to quit--- at glusterd-handler.c:6372 #3 0x00007f4f63ed6a5a in glusterd_big_locked_notify (rpc=0x55d7715674a0, mydata=0x55d771565e10, event=RPC_CLNT_CONNECT, data=0x0, notify_fn=0x7f4f63ee0830 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:70 #4 0x00007f4f6f215594 in rpc_clnt_notify (trans=<optimized out>, mydata=0x55d7715674d0, event=<optimized out>, data=0x55d7715676d0) at rpc-clnt.c:1004 #5 0x00007f4f6f211393 in rpc_transport_notify (this=this@entry=0x55d7715676d0, event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x55d7715676d0) at rpc-transport.c:538 #6 0x00007f4f6111e367 in socket_connect_finish (this=this@entry=0x55d7715676d0) at socket.c:2404 #7 0x00007f4f61122aa8 in socket_event_handler (fd=11, idx=2, gen=1, data=0x55d7715676d0, poll_in=0, poll_out=4, poll_err=0) at socket.c:2456 #8 0x00007f4f6f4aae34 in event_dispatch_epoll_handler (event=0x7f4f5f173e80, event_pool=0x55d7714a7210) at event-epoll.c:583 #9 event_dispatch_epoll_worker (data=0x55d771571f10) at event-epoll.c:659 #10 0x00007f4f6e2abdd5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f4f6db74b3d in clone () from /lib64/libc.so.6 while doing an yum update below is the cli message /var/tmp/rpm-tmp.O031c7: line 26: 14148 Segmentation fault (core dumped) glusterd --xlator-option *.upgrade=on -N Verifying : glusterfs-client-xlators-3.12.2-9.el7rhgs.x86_64 1/14 Verifying : glusterfs-3.12.2-9.el7rhgs.x86_64 2/14 Verifying : glusterfs-api-3.12.2-9.el7rhgs.x86_64 *** Bug 1622554 has been marked as a duplicate of this bug. *** |