+++ This bug was initially created as a clone of Bug #1238067 +++ Description of problem: ======================= Seen a glusterd crash. No restarts were done and there IO running from the client. Did a peer probe to another server which failed with 107 error. Backtrace: ========= (gdb) bt #0 _rcu_read_lock_bp () at urcu/static/urcu-bp.h:199 #1 rcu_read_lock_bp () at urcu-bp.c:271 #2 0x00007fb1d33cd256 in __glusterd_peer_rpc_notify (rpc=0x7fb1df49c8d0, mydata=<value optimized out>, event=RPC_CLNT_DISCONNECT, data=<value optimized out>) at glusterd-handler.c:4996 #3 0x00007fb1d33b0c50 in glusterd_big_locked_notify (rpc=0x7fb1df49c8d0, mydata=0x7fb1df49c250, event=RPC_CLNT_DISCONNECT, data=0x0, notify_fn=0x7fb1d33cd1f0 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:71 #4 0x00007fb1de793953 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x7fb1df49c900, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:861 #5 0x00007fb1de78ead8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:543 #6 0x00007fb1d1a53df1 in socket_event_poll_err (fd=<value optimized out>, idx=<value optimized out>, data=0x7fb1df49fa60, poll_in=<value optimized out>, poll_out=0, poll_err=0) at socket.c:1205 #7 socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7fb1df49fa60, poll_in=<value optimized out>, poll_out=0, poll_err=0) at socket.c:2410 #8 0x00007fb1dea27970 in event_dispatch_epoll_handler (data=0x7fb1df4edda0) at event-epoll.c:575 #9 event_dispatch_epoll_worker (data=0x7fb1df4edda0) at event-epoll.c:678 #10 0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0 #11 0x00007fb1dd41896d in clone () from /lib64/libc.so.6 (gdb) (gdb) t a a bt Thread 7 (Thread 0x7fb1cd6a7700 (LWP 10080)): #0 0x00007fb1ddab2a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fb1dea0acab in syncenv_task (proc=0x7fb1df34bb00) at syncop.c:595 #2 0x00007fb1dea0fba0 in syncenv_processor (thdata=0x7fb1df34bb00) at syncop.c:687 #3 0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0 #4 0x00007fb1dd41896d in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7fb1d5f03700 (LWP 2914)): #0 0x00007fb1ddab5fbd in nanosleep () from /lib64/libpthread.so.0 #1 0x00007fb1de9e55ca in gf_timer_proc (ctx=0x7fb1df31d010) at timer.c:205 #2 0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007fb1dd41896d in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7fb1d4100700 (LWP 2917)): #0 0x00007fb1ddab2a0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fb1dea0acab in syncenv_task (proc=0x7fb1df34afc0) at syncop.c:595 #2 0x00007fb1dea0fba0 in syncenv_processor (thdata=0x7fb1df34afc0) at syncop.c:687 #3 0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0 #4 0x00007fb1dd41896d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7fb1d02c3700 (LWP 3091)): #0 0x00007fb1ddab263c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fb1d3465973 in hooks_worker (args=<value optimized out>) at glusterd-hooks.c:534 #2 0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007fb1dd41896d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7fb1dee72740 (LWP 2913)): #0 0x00007fb1ddaaf2ad in pthread_join () from /lib64/libpthread.so.0 #1 0x00007fb1dea2741d in event_dispatch_epoll (event_pool=0x7fb1df33bc90) at event-epoll.c:762 #2 0x00007fb1dee8eef1 in main (argc=2, argv=0x7ffdfed58a08) at glusterfsd.c:2333 Thread 2 (Thread 0x7fb1d5502700 (LWP 2915)): #0 0x00007fb1d2c1cf18 in _fini () from /usr/lib64/liburcu-cds.so.1.0.0 #1 0x00007fb1dec72c7c in _dl_fini () from /lib64/ld-linux-x86-64.so.2 #2 0x00007fb1dd365b22 in exit () from /lib64/libc.so.6 #3 0x00007fb1dee8cc03 in cleanup_and_exit (signum=<value optimized out>) at glusterfsd.c:1276 #4 0x00007fb1dee8d075 in glusterfs_sigwaiter (arg=<value optimized out>) at glusterfsd.c:1997 #5 0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0 #6 0x00007fb1dd41896d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7fb1cf8c2700 (LWP 3092)): #0 _rcu_read_lock_bp () at urcu/static/urcu-bp.h:199 #1 rcu_read_lock_bp () at urcu-bp.c:271 #2 0x00007fb1d33cd256 in __glusterd_peer_rpc_notify (rpc=0x7fb1df49c8d0, mydata=<value optimized out>, event=RPC_CLNT_DISCONNECT, data=<value optimized out>) at glusterd-handler.c:4996 #3 0x00007fb1d33b0c50 in glusterd_big_locked_notify (rpc=0x7fb1df49c8d0, mydata=0x7fb1df49c250, event=RPC_CLNT_DISCONNECT, data=0x0, ---Type <return> to continue, or q <return> to quit--- notify_fn=0x7fb1d33cd1f0 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:71 #4 0x00007fb1de793953 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x7fb1df49c900, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:861 #5 0x00007fb1de78ead8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:543 #6 0x00007fb1d1a53df1 in socket_event_poll_err (fd=<value optimized out>, idx=<value optimized out>, data=0x7fb1df49fa60, poll_in=<value optimized out>, poll_out=0, poll_err=0) at socket.c:1205 #7 socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7fb1df49fa60, poll_in=<value optimized out>, poll_out=0, poll_err=0) at socket.c:2410 #8 0x00007fb1dea27970 in event_dispatch_epoll_handler (data=0x7fb1df4edda0) at event-epoll.c:575 #9 event_dispatch_epoll_worker (data=0x7fb1df4edda0) at event-epoll.c:678 #10 0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0 #11 0x00007fb1dd41896d in clone () from /lib64/libc.so.6 (gdb) (gdb) Version-Release number of selected component (if applicable): ============================================================= [root@ninja core]# gluster --version glusterfs 3.7.1 built on Jun 28 2015 11:01:17 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@ninja core]# How reproducible: ================ seen once Actual results: Expected results: Additional info: --- Additional comment from Red Hat Bugzilla Rules Engine on 2015-07-01 02:43:54 EDT --- This bug is automatically being proposed for Red Hat Gluster Storage 3.1.0 by setting the release flag 'rhgs‑3.1.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Bhaskarakiran on 2015-07-01 02:45:58 EDT --- --- Additional comment from Bhaskarakiran on 2015-07-01 05:17:37 EDT --- copied the sosreports to rhsqe-repo/sosreports/1238067 folder. --- Additional comment from Bhaskarakiran on 2015-07-01 05:22:35 EDT --- time of crash : -rw-------. 1 root root 232M Jun 30 16:14 core.2913.1435661084.dump --- Additional comment from Bhaskarakiran on 2015-07-01 05:27:20 EDT --- sosrepot : rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1238067/sosreport-sysreg-prod-20150701140725.tar.xz --- Additional comment from Atin Mukherjee on 2015-07-01 23:48:18 EDT --- The crash happened while glusterD service was going down. This doesn't impact the functionality and the crash is because of race between clean up thread and running thread. The clean up thread releases URCU resources while one of the running thread still try to access it resulting into a crash. Hence this can be deferred to 3.1.z. --- Additional comment from Rejy M Cyriac on 2015-07-03 01:54:02 EDT --- Since this BZ is not a Blocker for the RHGS 3.1 release, and the phase for fixing non-blocker bugs is over for the release, re-proposing this BZ for the RHGS 3.1 Z-stream release
REVIEW: http://review.gluster.org/11532 (glusterd/synctask: destroy all synctask and epoll threads in fini) posted (#1) for review on master by Anand Nekkunti (anekkunt)
REVIEW: http://review.gluster.org/11532 (glusterd/synctask: destroy all synctask and epoll threads in fini) posted (#2) for review on master by Anand Nekkunti (anekkunt)
REVIEW: http://review.gluster.org/11532 (glusterd/synctask: destroy all synctask and epoll threads in fini) posted (#3) for review on master by Anand Nekkunti (anekkunt)
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
Fixed through commit 6b58e84 commit 6b58e8426a36bc544c06a599311999bf89ad04f2 Author: Atin Mukherjee <amukherj> Date: Wed Oct 3 16:34:54 2018 +0530 glusterd: ignore RPC events when glusterd is shutting down When glusterd receives a SIGTERM while it receives RPC connect/disconnect/destroy events, the thread might lead to a crash while accessing rcu_read_lock () as the clean up thread might have already freed up the resources. This is more observable when glusterd comes up with upgrade mode = on during upgrade process. The solution is to ignore these events if glusterd is already in the middle of cleanup_and_exit (). Fixes: bz#1635593 Change-Id: I12831d31c2f689d4deb038b83b9421bd5cce26d9 Signed-off-by: Atin Mukherjee <amukherj>