Description of problem: when executing a cleanup_and_exit, a shd daemon is crashed. This is because there is a chance that a parallel graph free thread might be executing another cleanup Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. run ./tests/bugs/glusterd/reset-brick-and-daemons-follow-quorum.t in a loop 2. 3. Actual results: Expected results: Additional info:
REVIEW: https://review.gluster.org/22709 (glusterfsd/cleanup: Protect graph object under a lock) posted (#1) for review on master by mohammed rafi kc
Rafi, Could you share the bt of the core so that it is easier to understand why exactly it crashed? Pranith
Stack trace of thread 30877: #0 0x0000000000406a07 cleanup_and_exit (glusterfsd) #1 0x0000000000406b5d glusterfs_sigwaiter (glusterfsd) #2 0x00007f51000cd58e start_thread (libpthread.so.0) #3 0x00007f50ffd1d683 __clone (libc.so.6) Stack trace of thread 30879: #0 0x00007f51000d3a7a futex_abstimed_wait_cancelable (libpthread.so.0) #1 0x00007f51003b8616 syncenv_task (libglusterfs.so.0) #2 0x00007f51003b9240 syncenv_processor (libglusterfs.so.0) #3 0x00007f51000cd58e start_thread (libpthread.so.0) #4 0x00007f50ffd1d683 __clone (libc.so.6) Stack trace of thread 30881: #0 0x00007f50ffd14cdf __GI___select (libc.so.6) #1 0x00007f51003ef1cd runner (libglusterfs.so.0) #2 0x00007f51000cd58e start_thread (libpthread.so.0) #3 0x00007f50ffd1d683 __clone (libc.so.6) Stack trace of thread 30880: #0 0x00007f51000d3a7a futex_abstimed_wait_cancelable (libpthread.so.0) #1 0x00007f51003b8616 syncenv_task (libglusterfs.so.0) #2 0x00007f51003b9240 syncenv_processor (libglusterfs.so.0) #3 0x00007f51000cd58e start_thread (libpthread.so.0) #4 0x00007f50ffd1d683 __clone (libc.so.6) Stack trace of thread 30876: #0 0x00007f51000d7500 __GI___nanosleep (libpthread.so.0) #1 0x00007f510038a346 gf_timer_proc (libglusterfs.so.0) #2 0x00007f51000cd58e start_thread (libpthread.so.0) #3 0x00007f50ffd1d683 __clone (libc.so.6) Stack trace of thread 30882: #0 0x00007f50ffd1e06e epoll_ctl (libc.so.6) #1 0x00007f51003d931e event_handled_epoll (libglusterfs.so.0) #2 0x00007f50eed9a781 socket_event_poll_in (socket.so) #3 0x00007f51003d8c9b event_dispatch_epoll_handler (libglusterfs.so.0) #4 0x00007f51000cd58e start_thread (libpthread.so.0) #5 0x00007f50ffd1d683 __clone (libc.so.6) Stack trace of thread 30875: #0 0x00007f51000cea6d __GI___pthread_timedjoin_ex (libpthread.so.0) #1 0x00007f51003d8387 event_dispatch_epoll (libglusterfs.so.0) #2 0x0000000000406592 main (glusterfsd) #3 0x00007f50ffc44413 __libc_start_main (libc.so.6) #4 0x00000000004067de _start (glusterfsd) Stack trace of thread 30878: #0 0x00007f50ffce97f8 __GI___nanosleep (libc.so.6) #1 0x00007f50ffce96fe __sleep (libc.so.6) #2 0x00007f51003a4f5a pool_sweeper (libglusterfs.so.0) #3 0x00007f51000cd58e start_thread (libpthread.so.0) #4 0x00007f50ffd1d683 __clone (libc.so.6) Stack trace of thread 30883: #0 0x00007f51000d6b8d __lll_lock_wait (libpthread.so.0) #1 0x00007f51000cfda9 __GI___pthread_mutex_lock (libpthread.so.0) #2 0x00007f510037cd1f _gf_msg_plain_internal (libglusterfs.so.0) #3 0x00007f510037ceb3 _gf_msg_plain (libglusterfs.so.0) #4 0x00007f5100382d43 gf_log_dump_graph (libglusterfs.so.0) #5 0x00007f51003b514f glusterfs_process_svc_attach_volfp (libglusterfs.so.0) #6 0x000000000040b16d mgmt_process_volfile (glusterfsd) #7 0x0000000000410792 mgmt_getspec_cbk (glusterfsd) #8 0x00007f51003256b1 rpc_clnt_handle_reply (libgfrpc.so.0) #9 0x00007f5100325a53 rpc_clnt_notify (libgfrpc.so.0) #10 0x00007f5100322973 rpc_transport_notify (libgfrpc.so.0) #11 0x00007f50eed9a45c socket_event_poll_in (socket.so) #12 0x00007f51003d8c9b event_dispatch_epoll_handler (libglusterfs.so.0) #13 0x00007f51000cd58e start_thread (libpthread.so.0) #14 0x00007f50ffd1d683 __clone (libc.so.6)
(In reply to Mohammed Rafi KC from comment #3) > Stack trace of thread 30877: > #0 0x0000000000406a07 cleanup_and_exit (glusterfsd) > #1 0x0000000000406b5d glusterfs_sigwaiter (glusterfsd) > #2 0x00007f51000cd58e start_thread (libpthread.so.0) > #3 0x00007f50ffd1d683 __clone (libc.so.6) > > Stack trace of thread 30879: > #0 0x00007f51000d3a7a futex_abstimed_wait_cancelable > (libpthread.so.0) > #1 0x00007f51003b8616 syncenv_task (libglusterfs.so.0) > #2 0x00007f51003b9240 syncenv_processor (libglusterfs.so.0) > #3 0x00007f51000cd58e start_thread (libpthread.so.0) > #4 0x00007f50ffd1d683 __clone (libc.so.6) > > Stack trace of thread 30881: > #0 0x00007f50ffd14cdf __GI___select (libc.so.6) > #1 0x00007f51003ef1cd runner (libglusterfs.so.0) > #2 0x00007f51000cd58e start_thread (libpthread.so.0) > #3 0x00007f50ffd1d683 __clone (libc.so.6) > > Stack trace of thread 30880: > #0 0x00007f51000d3a7a futex_abstimed_wait_cancelable > (libpthread.so.0) > #1 0x00007f51003b8616 syncenv_task (libglusterfs.so.0) > #2 0x00007f51003b9240 syncenv_processor (libglusterfs.so.0) > #3 0x00007f51000cd58e start_thread (libpthread.so.0) > #4 0x00007f50ffd1d683 __clone (libc.so.6) > > Stack trace of thread 30876: > #0 0x00007f51000d7500 __GI___nanosleep (libpthread.so.0) > #1 0x00007f510038a346 gf_timer_proc (libglusterfs.so.0) > #2 0x00007f51000cd58e start_thread (libpthread.so.0) > #3 0x00007f50ffd1d683 __clone (libc.so.6) > > Stack trace of thread 30882: > #0 0x00007f50ffd1e06e epoll_ctl (libc.so.6) > #1 0x00007f51003d931e event_handled_epoll > (libglusterfs.so.0) > #2 0x00007f50eed9a781 socket_event_poll_in (socket.so) > #3 0x00007f51003d8c9b event_dispatch_epoll_handler > (libglusterfs.so.0) > #4 0x00007f51000cd58e start_thread (libpthread.so.0) > #5 0x00007f50ffd1d683 __clone (libc.so.6) > > Stack trace of thread 30875: > #0 0x00007f51000cea6d __GI___pthread_timedjoin_ex > (libpthread.so.0) > #1 0x00007f51003d8387 event_dispatch_epoll > (libglusterfs.so.0) > #2 0x0000000000406592 main (glusterfsd) > #3 0x00007f50ffc44413 __libc_start_main (libc.so.6) > #4 0x00000000004067de _start (glusterfsd) > > Stack trace of thread 30878: > #0 0x00007f50ffce97f8 __GI___nanosleep (libc.so.6) > #1 0x00007f50ffce96fe __sleep (libc.so.6) > #2 0x00007f51003a4f5a pool_sweeper (libglusterfs.so.0) > #3 0x00007f51000cd58e start_thread (libpthread.so.0) > #4 0x00007f50ffd1d683 __clone (libc.so.6) > > Stack trace of thread 30883: > #0 0x00007f51000d6b8d __lll_lock_wait (libpthread.so.0) > #1 0x00007f51000cfda9 __GI___pthread_mutex_lock > (libpthread.so.0) > #2 0x00007f510037cd1f _gf_msg_plain_internal > (libglusterfs.so.0) > #3 0x00007f510037ceb3 _gf_msg_plain (libglusterfs.so.0) > #4 0x00007f5100382d43 gf_log_dump_graph (libglusterfs.so.0) > #5 0x00007f51003b514f glusterfs_process_svc_attach_volfp > (libglusterfs.so.0) > #6 0x000000000040b16d mgmt_process_volfile (glusterfsd) > #7 0x0000000000410792 mgmt_getspec_cbk (glusterfsd) > #8 0x00007f51003256b1 rpc_clnt_handle_reply (libgfrpc.so.0) > #9 0x00007f5100325a53 rpc_clnt_notify (libgfrpc.so.0) > #10 0x00007f5100322973 rpc_transport_notify (libgfrpc.so.0) > #11 0x00007f50eed9a45c socket_event_poll_in (socket.so) > #12 0x00007f51003d8c9b event_dispatch_epoll_handler > (libglusterfs.so.0) > #13 0x00007f51000cd58e start_thread (libpthread.so.0) > #14 0x00007f50ffd1d683 __clone (libc.so.6) Was graph->active NULL? What lead to the crash?
REVIEW: https://review.gluster.org/22743 (afr/frame: Destroy frame after afr_selfheal_entry_granular) posted (#1) for review on master by mohammed rafi kc
REVIEW: https://review.gluster.org/22743 (afr/frame: Destroy frame after afr_selfheal_entry_granular) merged (#3) on master by Pranith Kumar Karampuri
REVIEW: https://review.gluster.org/22709 (glusterfsd/cleanup: Protect graph object under a lock) merged (#10) on master by Amar Tumballi