Bug 1475482
Summary: | libqb imposed crash in ringbuffer: division by zero | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jan Pokorný [poki] <jpokorny> |
Component: | corosync | Assignee: | Jan Friesse <jfriesse> |
Status: | CLOSED DUPLICATE | QA Contact: | cluster-qe <cluster-qe> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.4 | CC: | ccaulfie, cfeist, cluster-maint, ctowsley, dkaylor, jfriesse, jpokorny, kgaillot, rpeterso, sbradley, stanislav.moravec |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-01-31 09:18:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jan Pokorný [poki]
2017-07-26 18:51:07 UTC
There may also be high sophistication involved: https://github.com/ClusterLabs/libqb/commit/9b3be0b4 but hopefully it will be related to something dead simple. Hmm, just noticed this wrt. atomic synchronization primitive: https://github.com/ClusterLabs/libqb/wiki/FAQ (although this wouldn't be directly related). Got another report, this time with libqb-1.0-1.el7_3.1.x86_64, location in code matching the original (line 574 is offending) report in [comment 0]: 573 old_read_pt = rb->shared_hdr->read_pt; 574 chunk_magic = QB_RB_CHUNK_MAGIC_GET(rb, old_read_pt); 575 if (chunk_magic != QB_RB_CHUNK_MAGIC) { 576 return -EINVAL; 577 } Fortunately, this time the report was accompanied with the (cryptic a bit, offset + file references as generated by abrt, coredump was not directly usable) backtrace, from which I deduced a probable control flow, top to bottom: -- 1: corosync-2.4.0/exec/main.c:1405 1402 /* 1403 * Start main processing loop 1404 */ >1405 qb_loop_run (corosync_poll_handle); 1406 -- 2: libqb-1.0/lib/loop.c:43 31 static void 32 qb_loop_run_level(struct qb_loop_level *level) 33 { 34 struct qb_loop_item *job; 35 int32_t processed = 0; 36 37 Ill_have_another: 38 39 if (!qb_list_empty(&level->job_head)) { 40 job = qb_list_first_entry(&level->job_head, struct qb_loop_item, list); 41 qb_list_del(&job->list); 42 qb_list_init(&job->list); >43 job->source->dispatch_and_take_back(job, level->priority); 44 level->todo--; 45 processed++; 46 if (level->l->stop_requested) { 47 return; 48 } 49 if (processed < level->to_process) { 50 goto Ill_have_another; 51 } 52 } 53 } -- 3: libqb-1.0/lib/loop_poll.c:109 92 static void 93 _poll_dispatch_and_take_back_(struct qb_loop_item *item, 94 enum qb_loop_priority p) 95 { 96 struct qb_poll_entry *pe = (struct qb_poll_entry *)item; 97 int32_t res; 98 #ifdef DEBUG_DISPATCH_TIME 99 uint64_t start; 100 uint64_t stop; 101 int32_t log_warn = QB_FALSE; 102 103 start = qb_util_nano_current_get(); 104 #endif /* DEBUG_DISPATCH_TIME */ 105 106 assert(pe->state == QB_POLL_ENTRY_JOBLIST); 107 assert(pe->item.type == QB_LOOP_FD); 108 >109 res = pe->poll_dispatch_fn(pe->ufd.fd, 110 pe->ufd.revents, 111 pe->item.user_data); 112 if (res < 0) { 113 _poll_entry_mark_deleted_(pe); 114 } else if (pe->state != QB_POLL_ENTRY_DELETED) { 115 pe->state = QB_POLL_ENTRY_ACTIVE; 116 pe->ufd.revents = 0; 117 } -- 4: corosync-2.4.0/exec/totemudp.c:522 519 /* 520 * Handle incoming message 521 */ >522 instance->totemudp_deliver_fn ( 523 instance->context, 524 iovec->iov_base, 525 iovec->iov_len); 526 527 iovec->iov_len = FRAME_SIZE_MAX; 528 return (0); -- 5: corosync-2.4.0/exec/totemrrp.c:1941 or 1952 1937 if (token_is) { 1938 /* 1939 * Deliver to the token receiver for this rrp algorithm 1940 */ >1941 rrp_instance->rrp_algo->token_recv ( 1942 rrp_instance, 1943 deliver_fn_context->iface_no, 1944 deliver_fn_context->context, 1945 msg, 1946 msg_len, 1947 token_seqid); 1948 } else { 1949 /* 1950 * Deliver to the mcast receiver for this rrp algorithm 1951 */ >1952 rrp_instance->rrp_algo->mcast_recv ( 1953 rrp_instance, 1954 deliver_fn_context->iface_no, 1955 deliver_fn_context->context, 1956 msg, 1957 msg_len); 1958 } -- 6: corosync-2.4.0/exec/totemsrp.c:4087 >4087 log_printf (instance->totemsrp_log_level_trace, 4088 "Received ringid(%s:%lld) seq %x", 4089 totemip_print (&mcast_header.ring_id.rep), 4090 mcast_header.ring_id.seq, 4091 mcast_header.seq); -- 7: corosync-2.4.0/exec/main.c:957 947 static void 948 _logsys_log_printf(int level, int subsys, 949 const char *function_name, 950 const char *file_name, 951 int file_line, 952 const char *format, ...) 953 { 954 va_list ap; 955 956 va_start(ap, format); >957 qb_log_from_external_source_va(function_name, corosync_basename(file_name), 958 format, level, file_line, 959 subsys, ap); 960 va_end(ap); 961 } -- 8: libqb-1.0/lib/log.c:255 253 } else if (t->vlogger) { 254 va_copy(ap_copy, ap); >255 t->vlogger(t->pos, cs, tv.tv_sec, ap_copy); 256 va_end(ap_copy); -- 9: libqb-1.0/lib/log_blackbox.c:74 >74 chunk = qb_rb_chunk_alloc(t->instance, max_size); 75 76 if (chunk == NULL) { 77 /* something bad has happened. abort blackbox logging */ 78 qb_util_perror(LOG_ERR, "Blackbox allocation error, aborting blackbox log %s", t->filename); 79 qb_rb_close(t->instance); 80 t->instance = NULL; 81 return; 82 } -- 10: libqb-1.0/lib/ringbuffer.c:454 449 /* 450 * Reclaim data if we are over writing and we need space 451 */ 452 if (rb->flags & QB_RB_FLAG_OVERWRITE) { 453 while (qb_rb_space_free(rb) < (len + QB_RB_CHUNK_MARGIN)) { >454 int rc = _rb_chunk_reclaim(rb); 455 if (rc != 0) { 456 errno = rc; 457 return NULL; 458 } 459 } -- 11: libqb-1.0/lib/ringbuffer.c:574 564 static int 565 _rb_chunk_reclaim(struct qb_ringbuffer_s * rb) 566 { 567 uint32_t old_read_pt; 568 uint32_t new_read_pt; 569 uint32_t old_chunk_size; 570 uint32_t chunk_magic; 571 int rc = 0; 572 573 old_read_pt = rb->shared_hdr->read_pt; >574 chunk_magic = QB_RB_CHUNK_MAGIC_GET(rb, old_read_pt); 575 if (chunk_magic != QB_RB_CHUNK_MAGIC) { 576 return -EINVAL; 577 } Several notes: - blackbox is enabled prior to corosync-2.4.0/exec/main.c:main() is run (so the expectation is it's ready for service other) - from memory maps available in the coredump, the blackbox was indeed "visibly working": > Start Addr End Addr Size Offset objfile > 0x7f21b95e6000 0x7f21b9de7000 0x801000 0x0 /dev/shm/qb-corosync-29276-blackbox-data > 0x7f21b9de7000 0x7f21ba5e8000 0x801000 0x0 /dev/shm/qb-corosync-29276-blackbox-data > 0x7f21bcaa4000 0x7f21bcaa7000 0x3000 0x0 /dev/shm/qb-corosync-29276-blackbox-header - on the other hand, I don't think I've ever seen duplicate-mapped files with libqb's IPC ringbuffers like this (unless it's a matter of some implicit/system logic behind splitting large memory mapped blocks into [contiguous?] chunks): > Start Addr End Addr Size Offset objfile > 0x7f21abf80000 0x7f21ac081000 0x101000 0x0 /dev/shm/qb-cpg-event-29277-10216-41-data > 0x7f21ac081000 0x7f21ac182000 0x101000 0x0 /dev/shm/qb-cpg-event-29277-10216-41-data > 0x7f21ac182000 0x7f21ac283000 0x101000 0x0 /dev/shm/qb-cpg-response-29277-10216-41-data > 0x7f21ac283000 0x7f21ac384000 0x101000 0x0 /dev/shm/qb-cpg-response-29277-10216-41-data > 0x7f21ac384000 0x7f21ac485000 0x101000 0x0 /dev/shm/qb-cpg-request-29277-10216-41-data > 0x7f21ac485000 0x7f21ac586000 0x101000 0x0 /dev/shm/qb-cpg-request-29277-10216-41-data this may be a matter of larger IPC buffer, though (I've been told this behaviour might be related to using PCMK_ipc_buffer=16777216 on the pacemaker side) - from configure script output: > checking whether GCC supports builtin sync intrinsics... yes > checking whether GCC supports builtin atomic intrinsics... yes > checking whether atomics need memory barrier... yes meaning that qb_atomic_int_get_ex(X, QB_ATOMIC_ACQUIRE) is turned to __atomic_load_n(X, __ATOMIC_ACQUIRE) so either the atomic model could be changed (no knowledge here at this point) or the behaviour can be related to some bugs, e.g.: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 ( - should not be relevant here, but for posterity, there are some indications of __sync_synchronize not working properly prior to GCC 4.4, at least(?): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793 ) Note that per coredump, there was a parallel thread running, but as mentioned, there is a lot of information missing (basically just the per-frame intruction pointers (?) are known, and for some frames only: -- 1: glibc-2.17-c758a686/nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43 (futex_abstimed_wait) 32 /* Wrapper for lll_futex_wait with absolute timeout and error checking. 33 TODO Remove when cleaning up the futex API throughout glibc. */ 34 static __always_inline int 35 futex_abstimed_wait (unsigned int* futex, unsigned int expected, 36 const struct timespec* abstime, int private, bool cancel) 37 { 38 int err, oldtype; 39 if (abstime == NULL) 40 { 41 if (cancel) 42 oldtype = __pthread_enable_asynccancel (); >43 err = lll_futex_wait (futex, expected, private); 44 if (cancel) 45 __pthread_disable_asynccancel (oldtype); 46 } -- 2: ?? -- 3: ?? -- 4: ?? -- 5: /usr/src/debug/glibc-2.17-c758a686/nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:292 (__new_sem_wait_slow) 289 /* If there is no token available, sleep until there is. */ 290 if ((d & SEM_VALUE_MASK) == 0) 291 { >292 err = do_futex_wait (sem, abstime); This looks like this is a logging thread (definitely not the main one, which was detailed in [comment 4], and which also matches the PID encoded in IPC shared memory files previewed there as well), with the (missing for some reason) bottom-most frame: -- X: libqb-1.0/lib/log_thread.c:71 or 254 58 static void * 59 qb_logt_worker_thread(void *data) 60 { 61 struct qb_log_record *rec; 62 int dropped = 0; 63 int res; 64 65 /* 66 * Signal wthread_create that the initialization process may continue 67 */ 68 sem_post(&logt_thread_start); 69 for (;;) { 70 retry_sem_wait: >71 res = sem_wait(&logt_print_finished); or 238 void 239 qb_log_thread_stop(void) 240 { 241 int res; 242 int value; 243 struct qb_log_record *rec; 244 245 if (wthread_active == QB_FALSE && logt_wthread_lock == NULL) { 246 return; 247 } 248 if (wthread_active == QB_FALSE) { 249 for (;;) { 250 res = sem_getvalue(&logt_print_finished, &value); 251 if (res != 0 || value == 0) { 252 break; 253 } >254 sem_wait(&logt_print_finished); Note that blackbox logging is always synchronous/part of the main control flow (i.e. not deferred to logging thread), which limits the problem space somewhat (good). ( On the other hand, libqb currently seems very fragile as far threaded logging vs. in-main-thread configuration changes possibly imposing reloads goes: https://github.com/ClusterLabs/libqb/issues/268 ) It is logging thread for sure, because corosync does not have any other threads. There are simply just two of them. Main thread and logging thread. Maybe we should consider to disable this thread in corosync. Main reason to enable this functionality in corosync was concern about blocked logging but AFAIK pacemaker does not use logging thread and I'm not aware of any complains related to blocked logging. For whenever this issue is reproduced and it was (against all recommendations) no-power-fencing-configured case, hence /dev/shm content is preserved: # ls -l /dev/shm/qb-corosync-*-blackbox-* -rw-------. 1 root root 8392704 Aug 25 10:42 /dev/shm/qb-corosync-8671-blackbox-data -rw-------. 1 root root 8248 Aug 25 10:42 /dev/shm/qb-corosync-8671-blackbox-header Have a look whether the timestamps match or not. I've observed unrelated cases these were different, and I am not sure if the event getting the two out of sync cannot be related to reproducing the bug (data file being newer than header file). How is this issue reproducible? Would it make sense to make scratch build of corosync without activate threaded logging? Is customer willing to try it? (In reply to Jan Friesse from comment #9) > How is this issue reproducible? Would it make sense to make scratch build of > corosync without activate threaded logging? Is customer willing to try it? it's production env so unlikely to be receptive to any testing (but we asked). It happened once only, so effectiveness of the test is questionable too. (In reply to Stan Moravec from comment #10) > (In reply to Jan Friesse from comment #9) > > How is this issue reproducible? Would it make sense to make scratch build of > > corosync without activate threaded logging? Is customer willing to try it? > > it's production env so unlikely to be receptive to any testing (but we > asked). > It happened once only, so effectiveness of the test is questionable too. Ok, thanks for info. I got the feeling from BZ that we have at least 2 cores so it is probably reproducible. But if it happened only once for customer then it's probably not very effective way. re [comment 11]: 2 independent (I believe) reports, only one contained the core dump. It's actually not expected to have core dumps available even when there are reports, simply because how fencing works against the goal of having the state as of the crash captured. --- Looking further at the registers captured in the core dump, I can deduce that 1. rb->shared_hdr->read_pt at the point for crash was 0 (equally, the divident in the crashed division was 1 [edx:eax]) - this makes it more likely that some corner case might have been assisting the reproducibility, e.g.: . for some reason the blackbox logging was started anew (I see no apparent way to trigger) . there was some kind of integer wraparound back to 0 (again, not sure if such situation reachable at all) 2. timestamp of the message being logged: 1502206258 [r14] (which matches the timestamp of the sosreport) ( 3. size of the chunk to be allocated in qb_rb_chunk_alloc is 559 (571 [r12] - 12, QB_RB_CHUNK_MARGIN = 3 * 4 = 12), i.e., the actual size of the structured data (w/o message) to be logged is 559 - 512 = 47 [r13] -- although I cannot reach 22 bytes incl. null-terminator, per [comment 4, --6] for the function being logged from (but that value is present as r15) ) re [comment 12]: > 1. rb->shared_hdr->read_pt at the point for crash was 0 > (equally, the divident in the crashed division was 1 [edx:eax]) Actually and foremost, it makes the hypothesis that the memory page (at least the first one out of usually 3 with page size of 4096 B) was plain zeroed. Again, not sure which conditions (incl. possibly a bug in kernel or lucky segfault-free inner memory corruption amongst others) could lead to that state (hopefully not a deliberate /dev/shm/qb-corosync-*-blackbox-header modification). FWIW, customer offered they can possibly run single threaded corosync on their test env (where they did not see the problem on, however). As I understand your notes it may well be an unrelated memory corruption (one would expect more things to be affected, but the system went down so maybe they just were not observed yet) as well or some sort of rare race that resulted in process memory to be unexpectedly manipulated/zeroed out and we see just unrelated consequences (for which the singleT corosync may still be applicable). re [comment 4]: -- 10: libqb-1.0/lib/ringbuffer.c:454 449 /* 450 * Reclaim data if we are over writing and we need space 451 */ 452 if (rb->flags & QB_RB_FLAG_OVERWRITE) { 453 while (qb_rb_space_free(rb) < (len + QB_RB_CHUNK_MARGIN)) { >454 int rc = _rb_chunk_reclaim(rb); 455 if (rc != 0) { 456 errno = rc; 457 return NULL; 458 } 459 } Coincidentally, based on recent review of compiler warnings, I sense the code smell of > while (qb_rb_space_free(rb) < (len + QB_RB_CHUNK_MARGIN)) where ssize_t (also for error signalling) is being compared to size_t, which feasibly (I believe) might be the mysterious culprit here. This was allegedly caused by interaction between corosync and libqb (and indirectly made more probable because of how pcs keeps reprobing the managed systems). Hence closing as a duplicate of [bug 1536219] (reassigned hence to corosync), with a fix on libqb side being a new provision to allow handling after-fork cases gracefully [bug 1539936] and pcs side fix being a mitigation of running external executables with elevated privileges needlessly [bug 1537019]. *** This bug has been marked as a duplicate of bug 1536219 *** |