1454610 – [Glusterd] Crash seen on multiple nodes

Bug 1454610 - [Glusterd] Crash seen on multiple nodes

Summary: [Glusterd] Crash seen on multiple nodes

Keywords:
Status:	CLOSED DUPLICATE of bug 1328795
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Atin Mukherjee
QA Contact:	Bala Konda Reddy M
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-05-23 07:32 UTC by Sweta Anandpara
Modified:	2017-05-23 10:38 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-05-23 10:38:43 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sweta Anandpara 2017-05-23 07:32:07 UTC

Description of problem:
=======================
Had a 4node cluster with a few volumes and 3.8.4-23 build. Upgraded it to 3.8.4-24, setup samba, enabled nl-cache and continued with testing. Chanced upon 2 cores present in 2 peers. 

I am not completely sure what would have triggered a crash, but the day this happened was a test day for negative-lookup cache. In my testing of that day, I had touched upon features bitrot, eventing and nagios.


Version-Release number of selected component (if applicable):
============================================================
3.8.4-23/24


How reproducible:
=================
1:1


Additional info:
================

(gdb) t a a bt

Thread 8 (Thread 0x7f9f2a66e700 (LWP 29349)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f9f35027898 in syncenv_task (proc=proc@entry=0x7f9f36398bd0) at syncop.c:603
#2  0x00007f9f350286e0 in syncenv_processor (thdata=0x7f9f36398bd0) at syncop.c:695
#3  0x00007f9f33e4edc5 in start_thread (arg=0x7f9f2a66e700) at pthread_create.c:308
#4  0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 7 (Thread 0x7f9f2b670700 (LWP 29347)):
#0  0x00007f9f3375a66d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f9f3375a504 in __sleep (seconds=0, seconds@entry=30) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00007f9f3501582d in pool_sweeper (arg=<optimized out>) at mem-pool.c:464
#3  0x00007f9f33e4edc5 in start_thread (arg=0x7f9f2b670700) at pthread_create.c:308
#4  0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 6 (Thread 0x7f9f255d0700 (LWP 29550)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f9f29bc7c43 in hooks_worker (args=<optimized out>) at glusterd-hooks.c:531
#2  0x00007f9f33e4edc5 in start_thread (arg=0x7f9f255d0700) at pthread_create.c:308
#3  0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 5 (Thread 0x7f9f2be71700 (LWP 29346)):
#0  0x00007f9f255d2838 in sss_cli_close_socket () at src/sss_client/common.c:74
#1  0x00007f9f352c985a in _dl_fini () at dl-fini.c:253
#2  0x00007f9f336d4a49 in __run_exit_handlers (status=status@entry=15, listp=0x7f9f33a566c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#3  0x00007f9f336d4a95 in __GI_exit (status=status@entry=15) at exit.c:99
#4  0x00007f9f354e5e16 in cleanup_and_exit (signum=15) at glusterfsd.c:1342
#5  0x00007f9f354e5f05 in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2063
#6  0x00007f9f33e4edc5 in start_thread (arg=0x7f9f2be71700) at pthread_create.c:308
#7  0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 4 (Thread 0x7f9f2ae6f700 (LWP 29348)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f9f35027898 in syncenv_task (proc=proc@entry=0x7f9f36398810) at syncop.c:603
#2  0x00007f9f350286e0 in syncenv_processor (thdata=0x7f9f36398810) at syncop.c:695
#3  0x00007f9f33e4edc5 in start_thread (arg=0x7f9f2ae6f700) at pthread_create.c:308
#4  0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 3 (Thread 0x7f9f2c672700 (LWP 29345)):
#0  0x00007f9f33e55bdd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f9f34ffc306 in gf_timer_proc (data=0x7f9f36397fc0) at timer.c:176
#2  0x00007f9f33e4edc5 in start_thread (arg=0x7f9f2c672700) at pthread_create.c:308
#3  0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 2 (Thread 0x7f9f354c7780 (LWP 29344)):
#0  0x00007f9f33e4fef7 in pthread_join (threadid=140321494988544, thread_return=thread_return@entry=0x0) at pthread_join.c:92
#1  0x00007f9f350492e0 in event_dispatch_epoll (event_pool=0x7f9f36390730) at event-epoll.c:759
#2  0x00007f9f354e2d95 in main (argc=5, argv=<optimized out>) at glusterfsd.c:2464

Thread 1 (Thread 0x7f9f24dcf700 (LWP 29551)):
#0  _rcu_read_lock_bp () at urcu/static/urcu-bp.h:199
#1  rcu_read_lock_bp () at urcu-bp.c:271
#2  0x00007f9f29b1d90e in __glusterd_peer_rpc_notify (rpc=rpc@entry=0x7f9f3647fb90, mydata=mydata@entry=0x7f9f3647ec50, event=event@entry=RPC_CLNT_DISCONNECT, data=data@entry=0x0)
    at glusterd-handler.c:5807
#3  0x00007f9f29b13c3c in glusterd_big_locked_notify (rpc=0x7f9f3647fb90, mydata=0x7f9f3647ec50, event=RPC_CLNT_DISCONNECT, data=0x0, notify_fn=0x7f9f29b1d8c0 <__glusterd_peer_rpc_notify>)
    at glusterd-handler.c:69
#4  0x00007f9f34db8a13 in rpc_clnt_handle_disconnect (conn=0x7f9f3647fbc0, clnt=0x7f9f3647fb90) at rpc-clnt.c:892
#5  rpc_clnt_notify (trans=<optimized out>, mydata=0x7f9f3647fbc0, event=<optimized out>, data=0x7f9f3647fd90) at rpc-clnt.c:955
#6  0x00007f9f34db49f3 in rpc_transport_notify (this=this@entry=0x7f9f3647fd90, event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x7f9f3647fd90) at rpc-transport.c:538
#7  0x00007f9f26f83822 in socket_event_poll_err (this=0x7f9f3647fd90) at socket.c:1184
#8  socket_event_handler (fd=<optimized out>, idx=4, data=0x7f9f3647fd90, poll_in=1, poll_out=0, poll_err=<optimized out>) at socket.c:2418
#9  0x00007f9f35048e50 in event_dispatch_epoll_handler (event=0x7f9f24dcee80, event_pool=0x7f9f36390730) at event-epoll.c:572
#10 event_dispatch_epoll_worker (data=0x7f9f3639c9b0) at event-epoll.c:675
#11 0x00007f9f33e4edc5 in start_thread (arg=0x7f9f24dcf700) at pthread_create.c:308
#12 0x00007f9f3379373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 2 Sweta Anandpara 2017-05-23 07:38:18 UTC

[qe@rhsqe-repo 1454610]$ 
[qe@rhsqe-repo 1454610]$ hostname
rhsqe-repo.lab.eng.blr.redhat.com
[qe@rhsqe-repo 1454610]$ 
[qe@rhsqe-repo 1454610]$ pwd
/home/repo/sosreports/1454610
[qe@rhsqe-repo 1454610]$ 
[qe@rhsqe-repo 1454610]$ ll
total 301232
-rwxr-xr-x. 1 qe qe 156364800 May 23 12:54 core.19974
-rwxr-xr-x. 1 qe qe 152088576 May 23 12:54 core.29344
[qe@rhsqe-repo 1454610]$

Note You need to log in before you can comment on or make changes to this bug.