Description of problem: ======================= Had a 4node cluster with a few volumes and 3.8.4-23 build. Upgraded it to 3.8.4-24, setup samba, enabled nl-cache and continued with testing. Chanced upon 2 cores present in 2 peers. I am not completely sure what would have triggered a crash, but the day this happened was a test day for negative-lookup cache. In my testing of that day, I had touched upon features bitrot, eventing and nagios. Version-Release number of selected component (if applicable): ============================================================ 3.8.4-23/24 How reproducible: ================= 1:1 Additional info: ================ (gdb) t a a bt Thread 8 (Thread 0x7f9f2a66e700 (LWP 29349)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f9f35027898 in syncenv_task (proc=proc@entry=0x7f9f36398bd0) at syncop.c:603 #2 0x00007f9f350286e0 in syncenv_processor (thdata=0x7f9f36398bd0) at syncop.c:695 #3 0x00007f9f33e4edc5 in start_thread (arg=0x7f9f2a66e700) at pthread_create.c:308 #4 0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 7 (Thread 0x7f9f2b670700 (LWP 29347)): #0 0x00007f9f3375a66d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f9f3375a504 in __sleep (seconds=0, seconds@entry=30) at ../sysdeps/unix/sysv/linux/sleep.c:137 #2 0x00007f9f3501582d in pool_sweeper (arg=<optimized out>) at mem-pool.c:464 #3 0x00007f9f33e4edc5 in start_thread (arg=0x7f9f2b670700) at pthread_create.c:308 #4 0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 6 (Thread 0x7f9f255d0700 (LWP 29550)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f9f29bc7c43 in hooks_worker (args=<optimized out>) at glusterd-hooks.c:531 #2 0x00007f9f33e4edc5 in start_thread (arg=0x7f9f255d0700) at pthread_create.c:308 #3 0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 5 (Thread 0x7f9f2be71700 (LWP 29346)): #0 0x00007f9f255d2838 in sss_cli_close_socket () at src/sss_client/common.c:74 #1 0x00007f9f352c985a in _dl_fini () at dl-fini.c:253 #2 0x00007f9f336d4a49 in __run_exit_handlers (status=status@entry=15, listp=0x7f9f33a566c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77 #3 0x00007f9f336d4a95 in __GI_exit (status=status@entry=15) at exit.c:99 #4 0x00007f9f354e5e16 in cleanup_and_exit (signum=15) at glusterfsd.c:1342 #5 0x00007f9f354e5f05 in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2063 #6 0x00007f9f33e4edc5 in start_thread (arg=0x7f9f2be71700) at pthread_create.c:308 #7 0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 4 (Thread 0x7f9f2ae6f700 (LWP 29348)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f9f35027898 in syncenv_task (proc=proc@entry=0x7f9f36398810) at syncop.c:603 #2 0x00007f9f350286e0 in syncenv_processor (thdata=0x7f9f36398810) at syncop.c:695 #3 0x00007f9f33e4edc5 in start_thread (arg=0x7f9f2ae6f700) at pthread_create.c:308 #4 0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 3 (Thread 0x7f9f2c672700 (LWP 29345)): #0 0x00007f9f33e55bdd in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f9f34ffc306 in gf_timer_proc (data=0x7f9f36397fc0) at timer.c:176 #2 0x00007f9f33e4edc5 in start_thread (arg=0x7f9f2c672700) at pthread_create.c:308 #3 0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 2 (Thread 0x7f9f354c7780 (LWP 29344)): #0 0x00007f9f33e4fef7 in pthread_join (threadid=140321494988544, thread_return=thread_return@entry=0x0) at pthread_join.c:92 #1 0x00007f9f350492e0 in event_dispatch_epoll (event_pool=0x7f9f36390730) at event-epoll.c:759 #2 0x00007f9f354e2d95 in main (argc=5, argv=<optimized out>) at glusterfsd.c:2464 Thread 1 (Thread 0x7f9f24dcf700 (LWP 29551)): #0 _rcu_read_lock_bp () at urcu/static/urcu-bp.h:199 #1 rcu_read_lock_bp () at urcu-bp.c:271 #2 0x00007f9f29b1d90e in __glusterd_peer_rpc_notify (rpc=rpc@entry=0x7f9f3647fb90, mydata=mydata@entry=0x7f9f3647ec50, event=event@entry=RPC_CLNT_DISCONNECT, data=data@entry=0x0) at glusterd-handler.c:5807 #3 0x00007f9f29b13c3c in glusterd_big_locked_notify (rpc=0x7f9f3647fb90, mydata=0x7f9f3647ec50, event=RPC_CLNT_DISCONNECT, data=0x0, notify_fn=0x7f9f29b1d8c0 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:69 #4 0x00007f9f34db8a13 in rpc_clnt_handle_disconnect (conn=0x7f9f3647fbc0, clnt=0x7f9f3647fb90) at rpc-clnt.c:892 #5 rpc_clnt_notify (trans=<optimized out>, mydata=0x7f9f3647fbc0, event=<optimized out>, data=0x7f9f3647fd90) at rpc-clnt.c:955 #6 0x00007f9f34db49f3 in rpc_transport_notify (this=this@entry=0x7f9f3647fd90, event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x7f9f3647fd90) at rpc-transport.c:538 #7 0x00007f9f26f83822 in socket_event_poll_err (this=0x7f9f3647fd90) at socket.c:1184 #8 socket_event_handler (fd=<optimized out>, idx=4, data=0x7f9f3647fd90, poll_in=1, poll_out=0, poll_err=<optimized out>) at socket.c:2418 #9 0x00007f9f35048e50 in event_dispatch_epoll_handler (event=0x7f9f24dcee80, event_pool=0x7f9f36390730) at event-epoll.c:572 #10 event_dispatch_epoll_worker (data=0x7f9f3639c9b0) at event-epoll.c:675 #11 0x00007f9f33e4edc5 in start_thread (arg=0x7f9f24dcf700) at pthread_create.c:308 #12 0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
[qe@rhsqe-repo 1454610]$ [qe@rhsqe-repo 1454610]$ hostname rhsqe-repo.lab.eng.blr.redhat.com [qe@rhsqe-repo 1454610]$ [qe@rhsqe-repo 1454610]$ pwd /home/repo/sosreports/1454610 [qe@rhsqe-repo 1454610]$ [qe@rhsqe-repo 1454610]$ ll total 301232 -rwxr-xr-x. 1 qe qe 156364800 May 23 12:54 core.19974 -rwxr-xr-x. 1 qe qe 152088576 May 23 12:54 core.29344 [qe@rhsqe-repo 1454610]$