Description of problem: -================== I see a few glusterd cores in my 6 node cluster. This was probably with upgrade of build from 3.8.4.-3 to 3.8.4-5 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'. Program terminated with signal 11, Segmentation fault. #0 0x00007fc0f2ff00ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1 Missing separate debuginfos, use: debuginfo-install glusterfs-server-3.8.4-5.el7rhgs.x86_64 (gdb) bt #0 0x00007fc0f2ff00ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1 #1 0x00007fc0f3587c8e in __glusterd_peer_rpc_notify () from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so #2 0x00007fc0f357e18c in glusterd_big_locked_notify () from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so #3 0x00007fc0febd38ed in rpc_clnt_notify () from /lib64/libgfrpc.so.0 #4 0x00007fc0febcf883 in rpc_transport_notify () from /lib64/libgfrpc.so.0 #5 0x00007fc0f09f02f2 in socket_event_handler () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #6 0x00007fc0fee63340 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #7 0x00007fc0fdc6adc5 in start_thread () from /lib64/libpthread.so.0 #8 0x00007fc0fd5af73d in clone () from /lib64/libc.so.6 (gdb) quit
(gdb) t a a bt Thread 7 (Thread 0x7fc0ff2e1780 (LWP 25686)): #0 0x00007fc0fdc6bef7 in pthread_join () from /lib64/libpthread.so.0 #1 0x00007fc0fee637c8 in event_dispatch_epoll () from /lib64/libglusterfs.so.0 #2 0x00007fc0ff2fbae2 in main () Thread 6 (Thread 0x7fc0f50d5700 (LWP 25688)): #0 0x00007fc0f2a9b300 in __do_global_dtors_aux () from /lib64/liblvm2app.so.2.2 #1 0x00007fc0ff0e285a in _dl_fini () from /lib64/ld-linux-x86-64.so.2 #2 0x00007fc0fd4f0a49 in __run_exit_handlers () from /lib64/libc.so.6 #3 0x00007fc0fd4f0a95 in exit () from /lib64/libc.so.6 #4 0x00007fc0ff2feb5e in cleanup_and_exit () #5 0x00007fc0ff2fec45 in glusterfs_sigwaiter () #6 0x00007fc0fdc6adc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007fc0fd5af73d in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7fc0ef7e0700 (LWP 25891)): #0 0x00007fc0fdc6e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fc0f362e2c3 in glusterd_hooks_stub_init () from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so #2 0x6c40ade6a5c67fce in ?? () #3 0x0000000000000000 in ?? () Thread 4 (Thread 0x7fc0f40d3700 (LWP 25690)): #0 0x00007fc0fdc6ea82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fc0fee41df8 in syncenv_task () from /lib64/libglusterfs.so.0 #2 0x0000000000000000 in ?? () Thread 3 (Thread 0x7fc0f48d4700 (LWP 25689)): #0 0x00007fc0fdc6ea82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fc0fee41df8 in syncenv_task () from /lib64/libglusterfs.so.0 #2 0x0000000000000000 in ?? () Thread 2 (Thread 0x7fc0f58d6700 (LWP 25687)): #0 0x00007fc0fdc71bdd in nanosleep () from /lib64/libpthread.so.0 #1 0x00007fc0fee16c16 in gf_timer_proc () from /lib64/libglusterfs.so.0 #2 0x00007fc0fdc6adc5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007fc0fd5af73d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7fc0eefdf700 (LWP 25892)): #0 0x00007fc0f2ff00ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1 #1 0x00007fc0f3587c8e in __glusterd_peer_rpc_notify () from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so #2 0x00007fc0f357e18c in glusterd_big_locked_notify () from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so #3 0x00007fc0febd38ed in rpc_clnt_notify () from /lib64/libgfrpc.so.0 #4 0x00007fc0febcf883 in rpc_transport_notify () from /lib64/libgfrpc.so.0 #5 0x00007fc0f09f02f2 in socket_event_handler () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #6 0x00007fc0fee63340 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #7 0x00007fc0fdc6adc5 in start_thread () from /lib64/libpthread.so.0 #8 0x00007fc0fd5af73d in clone () from /lib64/libc.so.6
Please refer to the RCA - https://bugzilla.redhat.com/show_bug.cgi?id=1238067#c13 In short, Thread 6 initiated a cleanup and exit with a trigger to glusterd shutdown which ended up cleaning all the resources including urcu and then in __glusterd_peer_rpc_notify () while acquiring rcu read lock the same was inaccessible. Please note this is already marked as known issue in the known issue chapter. *** This bug has been marked as a duplicate of bug 1238067 ***