Created attachment 1128864 [details] mon log from scrash Description of problem: On 2/12 Cisco noticed a Monitor crash in their Tokyo production cluster. Stack is below: 2016-02-12 23:23:28.169430 7f88a84f97c0 -1 *** Caught signal (Aborted) ** in thread 7f88a84f97c0 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) 1: /usr/bin/ceph-mon() [0x903c52] 2: (()+0xf130) [0x7f88a76ba130] 3: (pthread_join()+0xa7) [0x7f88a76b3f27] 4: (Thread::join(void**)+0x12) [0x7a1b22] 5: (DispatchQueue::wait()+0x12) [0x8a39a2] 6: (SimpleMessenger::wait()+0x9a6) [0x797e66] 7: (main()+0x267c) [0x54e32c] 8: (__libc_start_main()+0xf5) [0x7f88a60c0af5] 9: /usr/bin/ceph-mon() [0x55d9c9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- I spoke with badone on this crash and discussed it's possibly some sort of thread accounting problem it looks like and the actual crash is in glibc's pthread_join. This was a single monitor in this cluster that encountered this issue and only has occurred once on 2/12 and has not re-occured since. Version-Release number of selected component (if applicable): 94.1 How reproducible: Has not been reproduced Additional info: SOS report is located: https://api.access.redhat.com/rs/cases/01586981/attachments/926af3db-cb1c-4808-a8b0-5222c2f69248
This looks to have been seen upstream on an OSD node in Giant: http://tracker.ceph.com/issues/13421
As the note suggests it is not too hard to find out more information about this issue. "NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this." We set up the copy like so. Install and register a 7.1 system. $ subscription-manager repos --enable=rhel-7-server-rpms --enable=rhel-7-server-rhceph-1.3-calamari-rpms --enable=rhel-7-server-rhceph-1.3-installer-rpms --enable=rhel-7-server-rhceph-1.3-mon-rpms --enable=rhel-7-server-rhceph-1.3-osd-rpms --enable=rhel-7-server-rhceph-1.3-tools-rpms $ yum install ceph-0.94.1-19.el7cp.x86_64 ceph-common-0.94.1-19.el7cp.x86_64 ceph-mon-0.94.1-19.el7cp.x86_64 Make sure you have the right glibc installed (based on the sosreport). $ debuginfo-install -y glibc ceph $ gdb `which ceph-mon` (gdb) b main Breakpoint 1 at 0x54bcb0: file ceph_mon.cc, line 205. (gdb) r Starting program: /usr/bin/ceph-mon [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Breakpoint 1, main (argc=1, argv=0x7fffffffe228) at ceph_mon.cc:205 205 { Missing separate debuginfos, use: debuginfo-install gperftools-libs-2.4-7.el7.x86_64 leveldb-1.12.0-5.el7cp.x86_64 libunwind-1.1-5.el7_2.2.x86_64 nss-util-3.16.2.3-2.el7.x86_64 snappy-1.1.0-3.el7.x86_64 (gdb) set pag off Note that we want to find pthread_join()+0xa7 and that 0xa7 in decimal (gdb default) is 167. (gdb) disass /m pthread_join,pthread_join+167 Dump of assembler code from 0x7ffff719de80 to 0x7ffff719df27: ... 88 /* There is already somebody waiting for the thread. */ 89 result = EINVAL; 90 else 91 /* Wait for the child. */ 92 lll_wait_tid (pd->tid); 0x00007ffff719df06 <pthread_join+134>: mov 0x2d0(%rbx),%ecx 0x00007ffff719df0c <pthread_join+140>: test %ecx,%ecx 0x00007ffff719df0e <pthread_join+142>: mov %ecx,%edx 0x00007ffff719df10 <pthread_join+144>: je 0x7ffff719df2c <pthread_join+172> 0x00007ffff719df12 <pthread_join+146>: lea 0x2d0(%rbx),%rdi 0x00007ffff719df19 <pthread_join+153>: xor %esi,%esi 0x00007ffff719df1b <pthread_join+155>: xor %r10,%r10 0x00007ffff719df1e <pthread_join+158>: mov $0xca,%rax 0x00007ffff719df25 <pthread_join+165>: syscall 0x00007ffff719df27 <pthread_join+167>: cmpl $0x0,(%rdi) <---HERE 0x00007ffff719df2a <pthread_join+170>: jne 0x7ffff719df1e <pthread_join+158> So we crashed when we tried to dereference pd->tid. "pd" is constructed from the passed in threadid. 39 int 40 pthread_join (threadid, thread_return) 41 pthread_t threadid; 42 void **thread_return; 43 { 44 struct pthread *pd = (struct pthread *) threadid; The key to this may be the following statement from the pthread_join 3 man page. "If multiple threads simultaneously try to join with the same thread, the results are undefined." Here's where we attempt the join. 133 int Thread::join(void **prval) 134 { 135 if (thread_id == 0) { 136 assert("join on thread that was never started" == 0); 137 return -EINVAL; 138 } (gdb) 139 140 int status = pthread_join(thread_id, prval); <----------HERE 141 assert(status == 0); 142 thread_id = 0; 143 return status; 144 } (gdb) l DispatchQueue::wait 220 dispatch_thread.create(); 221 local_delivery_thread.create(); 222 } 223 224 void DispatchQueue::wait() 225 { 226 local_delivery_thread.join(); 227 dispatch_thread.join(); <----------HERE 228 } Just an invalid thread_id shouldn't cause this as pthread_join should return immediately under those circumstances. 6: (SimpleMessenger::wait()+0x9a6) [0x797e66] (gdb) disass /m 0x797e66 ... 537 dispatch_queue.wait(); <--- HERE 0x0000000000797e5a <+2458>: lea 0x1c8(%rbx),%rdi 0x0000000000797e61 <+2465>: callq 0x8a3990 <DispatchQueue::wait()> 538 ldout(cct,10) << "wait: dispatch queue is stopped" << dendl; 0x0000000000797e66 <+2470>: mov 0xd0(%rbx),%rcx That's this code. 526 void SimpleMessenger::wait() 527 { 528 lock.Lock(); 529 if (!started) { 530 lock.Unlock(); 531 return; 532 } 533 lock.Unlock(); 534 535 if(dispatch_queue.is_started()) { 536 ldout(cct,10) << "wait: waiting for dispatch queue" << dendl; 537 dispatch_queue.wait(); 538 ldout(cct,10) << "wait: dispatch queue is stopped" << dendl; 539 } Sam, any chance we are calling pthread_join from two threads simultaneously with the same thread_id? The other possibility here is that we are working with a thread_id from a *failed* call to pthread_create. From that function's man page; "On success, pthread_create() returns 0; on error, it returns an error number, and the contents of *thread are undefined." Maybe we don't check the return code of pthread_create somewhere? AFAICT we check it in src/common/Thread.cc so perhaps the concurrency issue is more likely?
Okay, so the SimpleMessenger::wait() and DispatchQueue::wait() calls are pretty boring — in general, we start waiting as soon as we finish starting up. If we are in fact on the second thread join, that means we've already initiated shutdown (thanks for flagging that, Sam!), but I'm not seeing how we reached that conclusion. My best guess at the moment is that something happened which caused the benign parent thread to get blamed for the failure — I believe we've seen that a couple of times before. If we've got a core dump we could try and dig through all the threads, but with this there's not much to go on and I think it's a false trail.
Looks like this was done deliberately by sending SIGABRT to the process; so this backtrace is exactly what we'd expect to see.
(In reply to Greg Farnum from comment #8) > Looks like this was done deliberately by sending SIGABRT to the process; so > this backtrace is exactly what we'd expect to see. Agreed, exactly what we would expect to see.