Bug 1475482

Summary:	libqb imposed crash in ringbuffer: division by zero
Product:	Red Hat Enterprise Linux 7	Reporter:	Jan Pokorný [poki] <jpokorny>
Component:	corosync	Assignee:	Jan Friesse <jfriesse>
Status:	CLOSED DUPLICATE	QA Contact:	cluster-qe <cluster-qe>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	7.4	CC:	ccaulfie, cfeist, cluster-maint, ctowsley, dkaylor, jfriesse, jpokorny, kgaillot, rpeterso, sbradley, stanislav.moravec
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-01-31 09:18:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jan Pokorný [poki] 2017-07-26 18:51:07 UTC

Got a core-less report for libqb-1.0.1-5.el7.x86_64 and the only known
hints are:

- what crashed was corosync (corosync-2.4.0-9.el7.x86_64)
- kernel logged:
> traps: corosync[5704] trap divide error ip:7fca05f87de6
> sp:7ffd29ea54c0 error:0 in libqb.so.0.18.1[7fca05f7d000+25000]
- systemd logged:
> corosync.service: main process exited, code=killed, status=8/FPE
- inferred crashing line within libqb : lib/ringbuffer.c:543

# yum install libqb-debuginfo-1.0.1-5.el7.x86_64
# vim /usr/src/debug/libqb-1.0.1/lib/ringbuffer.c

542         old_read_pt = rb->shared_hdr->read_pt;
543         chunk_magic = QB_RB_CHUNK_MAGIC_GET(rb, old_read_pt);
544         if (chunk_magic != QB_RB_CHUNK_MAGIC) {
545                 return -EINVAL;
546         }

after preprocessing (final line wrapping+annotation is mine):

# touch /usr/src/debug/libqb-1.0.1/lib/rpl_sem.h
# cpp -I /usr/src/debug/libqb-1.0.1/include \
      -I /usr/src/debug/libqb-1.0.1/include/qb \
      /usr/src/debug/libqb-1.0.1/lib/ringbuffer.c 2>/dev/null \
      | sed -n '/^\s*old_read_p/{N;N;N;N;p}'
> old_read_pt = rb->shared_hdr->read_pt;
> chunk_magic = qb_atomic_int_get_ex(
>   (int32_t*)&rb->shared_data[
>     (old_read_pt + 1) % rb->shared_hdr->word_size
                       ^^^
                       here we must be dividing by zero, i.e.
                       rb->shared_hdr->word_size == 0

>   ], QB_ATOMIC_ACQUIRE);
> if (chunk_magic != 0xA1A1A1A1) {
>  return -22;
> }

Comment 1 Jan Pokorný [poki] 2017-07-27 18:24:12 UTC

There may also be high sophistication involved:
  https://github.com/ClusterLabs/libqb/commit/9b3be0b4
but hopefully it will be related to something dead simple.

Comment 2 Jan Pokorný [poki] 2017-08-08 13:56:59 UTC

Hmm, just noticed this wrt. atomic synchronization primitive:
https://github.com/ClusterLabs/libqb/wiki/FAQ
(although this wouldn't be directly related).

Comment 3 Jan Pokorný [poki] 2017-08-23 13:48:13 UTC

Got another report, this time with libqb-1.0-1.el7_3.1.x86_64,
location in code matching the original (line 574 is offending)
report in [comment 0]:

573         old_read_pt = rb->shared_hdr->read_pt;
574         chunk_magic = QB_RB_CHUNK_MAGIC_GET(rb, old_read_pt);
575         if (chunk_magic != QB_RB_CHUNK_MAGIC) {
576                 return -EINVAL;
577         }

Comment 4 Jan Pokorný [poki] 2017-08-23 18:17:02 UTC

Fortunately, this time the report was accompanied with the (cryptic
a bit, offset + file references as generated by abrt, coredump was not
directly usable) backtrace, from which I deduced a probable control flow,
top to bottom:

-- 1: corosync-2.4.0/exec/main.c:1405
 1402         /*
 1403          * Start main processing loop
 1404          */
>1405          qb_loop_run (corosync_poll_handle);
 1406

-- 2: libqb-1.0/lib/loop.c:43
 31 static void
 32 qb_loop_run_level(struct qb_loop_level *level)
 33 {
 34         struct qb_loop_item *job;
 35         int32_t processed = 0;
 36
 37 Ill_have_another:
 38
 39         if (!qb_list_empty(&level->job_head)) {
 40                 job = qb_list_first_entry(&level->job_head, struct qb_loop_item, list);
 41                 qb_list_del(&job->list);
 42                 qb_list_init(&job->list);
>43                 job->source->dispatch_and_take_back(job, level->priority);
 44                 level->todo--;
 45                 processed++;
 46                 if (level->l->stop_requested) {
 47                         return;
 48                 }
 49                 if (processed < level->to_process) {
 50                         goto Ill_have_another;
 51                 }
 52         }
 53 }

-- 3: libqb-1.0/lib/loop_poll.c:109
  92 static void
  93 _poll_dispatch_and_take_back_(struct qb_loop_item *item,
  94                               enum qb_loop_priority p)
  95 {
  96         struct qb_poll_entry *pe = (struct qb_poll_entry *)item;
  97         int32_t res;
  98 #ifdef DEBUG_DISPATCH_TIME
  99         uint64_t start;
 100         uint64_t stop;
 101         int32_t log_warn = QB_FALSE;
 102
 103         start = qb_util_nano_current_get();
 104 #endif /* DEBUG_DISPATCH_TIME */
 105
 106         assert(pe->state == QB_POLL_ENTRY_JOBLIST);
 107         assert(pe->item.type == QB_LOOP_FD);
 108
>109         res = pe->poll_dispatch_fn(pe->ufd.fd,
 110                                    pe->ufd.revents,
 111                                    pe->item.user_data);
 112         if (res < 0) {
 113                 _poll_entry_mark_deleted_(pe);
 114         } else if (pe->state != QB_POLL_ENTRY_DELETED) {
 115                 pe->state = QB_POLL_ENTRY_ACTIVE;
 116                 pe->ufd.revents = 0;
 117         }


-- 4: corosync-2.4.0/exec/totemudp.c:522
 519         /*
 520          * Handle incoming message
 521          */
>522         instance->totemudp_deliver_fn (
 523                 instance->context,
 524                 iovec->iov_base,
 525                 iovec->iov_len);
 526
 527         iovec->iov_len = FRAME_SIZE_MAX;
 528         return (0);

-- 5: corosync-2.4.0/exec/totemrrp.c:1941 or 1952
 1937         if (token_is) {
 1938                 /*
 1939                  * Deliver to the token receiver for this rrp algorithm
 1940                  */
>1941                 rrp_instance->rrp_algo->token_recv (
 1942                         rrp_instance,
 1943                         deliver_fn_context->iface_no,
 1944                         deliver_fn_context->context,
 1945                         msg,
 1946                         msg_len,
 1947                         token_seqid);
 1948         } else {
 1949                 /*
 1950                  * Deliver to the mcast receiver for this rrp algorithm
 1951                  */
>1952                rrp_instance->rrp_algo->mcast_recv (
 1953                         rrp_instance,
 1954                         deliver_fn_context->iface_no,
 1955                         deliver_fn_context->context,
 1956                         msg,
 1957                         msg_len);
 1958         }

-- 6: corosync-2.4.0/exec/totemsrp.c:4087
>4087        log_printf (instance->totemsrp_log_level_trace,
 4088                "Received ringid(%s:%lld) seq %x",
 4089                totemip_print (&mcast_header.ring_id.rep),
 4090                mcast_header.ring_id.seq,
 4091                mcast_header.seq);

-- 7: corosync-2.4.0/exec/main.c:957
 947 static void
 948 _logsys_log_printf(int level, int subsys,
 949                 const char *function_name,
 950                 const char *file_name,
 951                 int file_line,
 952                 const char *format, ...)
 953 {
 954         va_list ap;
 955
 956         va_start(ap, format);
>957         qb_log_from_external_source_va(function_name, corosync_basename(file_name),
 958                                     format, level, file_line,
 959                                     subsys, ap);
 960         va_end(ap);
 961 }

-- 8: libqb-1.0/lib/log.c:255
 253                         } else if (t->vlogger) {
 254                                 va_copy(ap_copy, ap);
>255                                 t->vlogger(t->pos, cs, tv.tv_sec, ap_copy);
 256                                 va_end(ap_copy);

-- 9: libqb-1.0/lib/log_blackbox.c:74
>74         chunk = qb_rb_chunk_alloc(t->instance, max_size);
 75
 76         if (chunk == NULL) {
 77                 /* something bad has happened. abort blackbox logging */
 78                 qb_util_perror(LOG_ERR, "Blackbox allocation error, aborting blackbox log %s", t->filename);
 79                 qb_rb_close(t->instance);
 80                 t->instance = NULL;
 81                 return;
 82         }

-- 10: libqb-1.0/lib/ringbuffer.c:454
 449         /*
 450          * Reclaim data if we are over writing and we need space
 451          */
 452         if (rb->flags & QB_RB_FLAG_OVERWRITE) {
 453                 while (qb_rb_space_free(rb) < (len + QB_RB_CHUNK_MARGIN)) {
>454                        int rc = _rb_chunk_reclaim(rb);
 455                         if (rc != 0) {
 456                                 errno = rc;
 457                                 return NULL;
 458                         }
 459                 }

-- 11: libqb-1.0/lib/ringbuffer.c:574
 564 static int
 565 _rb_chunk_reclaim(struct qb_ringbuffer_s * rb)
 566 {
 567         uint32_t old_read_pt;
 568         uint32_t new_read_pt;
 569         uint32_t old_chunk_size;
 570         uint32_t chunk_magic;
 571         int rc = 0;
 572
 573         old_read_pt = rb->shared_hdr->read_pt;
>574         chunk_magic = QB_RB_CHUNK_MAGIC_GET(rb, old_read_pt);
 575         if (chunk_magic != QB_RB_CHUNK_MAGIC) {
 576                 return -EINVAL;
 577         }


Several notes:

- blackbox is enabled prior to corosync-2.4.0/exec/main.c:main() is run
  (so the expectation is it's ready for service other)

- from memory maps available in the coredump, the blackbox was indeed
  "visibly working":

>     Start Addr           End Addr       Size     Offset objfile
> 0x7f21b95e6000     0x7f21b9de7000   0x801000        0x0 /dev/shm/qb-corosync-29276-blackbox-data
> 0x7f21b9de7000     0x7f21ba5e8000   0x801000        0x0 /dev/shm/qb-corosync-29276-blackbox-data
> 0x7f21bcaa4000     0x7f21bcaa7000     0x3000        0x0 /dev/shm/qb-corosync-29276-blackbox-header

- on the other hand, I don't think I've ever seen duplicate-mapped files
  with libqb's IPC ringbuffers like this (unless it's a matter of some
  implicit/system logic behind splitting large memory mapped blocks into
  [contiguous?] chunks):
>     Start Addr           End Addr       Size     Offset objfile
> 0x7f21abf80000     0x7f21ac081000   0x101000        0x0 /dev/shm/qb-cpg-event-29277-10216-41-data
> 0x7f21ac081000     0x7f21ac182000   0x101000        0x0 /dev/shm/qb-cpg-event-29277-10216-41-data
> 0x7f21ac182000     0x7f21ac283000   0x101000        0x0 /dev/shm/qb-cpg-response-29277-10216-41-data
> 0x7f21ac283000     0x7f21ac384000   0x101000        0x0 /dev/shm/qb-cpg-response-29277-10216-41-data
> 0x7f21ac384000     0x7f21ac485000   0x101000        0x0 /dev/shm/qb-cpg-request-29277-10216-41-data
> 0x7f21ac485000     0x7f21ac586000   0x101000        0x0 /dev/shm/qb-cpg-request-29277-10216-41-data
  this may be a matter of larger IPC buffer, though (I've been told
  this behaviour might be related to using PCMK_ipc_buffer=16777216
  on the pacemaker side)

- from configure script output:
> checking whether GCC supports builtin sync intrinsics... yes
> checking whether GCC supports builtin atomic intrinsics... yes
> checking whether atomics need memory barrier... yes
  meaning that qb_atomic_int_get_ex(X, QB_ATOMIC_ACQUIRE) is turned to
  __atomic_load_n(X, __ATOMIC_ACQUIRE) so either the atomic model could
  be changed (no knowledge here at this point) or the behaviour can be
  related to some bugs, e.g.:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

(
- should not be relevant here, but for posterity, there are some indications
  of __sync_synchronize not working properly prior to GCC 4.4, at least(?):
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
)

Comment 5 Jan Pokorný [poki] 2017-08-24 19:12:59 UTC

Note that per coredump, there was a parallel thread running, but as
mentioned, there is a lot of information missing (basically just the
per-frame intruction pointers (?) are known, and for some frames only:

-- 1: glibc-2.17-c758a686/nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
      (futex_abstimed_wait)
 32 /* Wrapper for lll_futex_wait with absolute timeout and error checking.
 33    TODO Remove when cleaning up the futex API throughout glibc.  */
 34 static __always_inline int
 35 futex_abstimed_wait (unsigned int* futex, unsigned int expected,
 36                      const struct timespec* abstime, int private, bool cancel)
 37 {
 38   int err, oldtype;
 39   if (abstime == NULL)
 40     {
 41       if (cancel)
 42         oldtype = __pthread_enable_asynccancel ();
>43       err = lll_futex_wait (futex, expected, private);
 44       if (cancel)
 45         __pthread_disable_asynccancel (oldtype);
 46     }

-- 2: ??

-- 3: ??

-- 4: ??

-- 5: /usr/src/debug/glibc-2.17-c758a686/nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:292
      (__new_sem_wait_slow)
 289       /* If there is no token available, sleep until there is.  */
 290       if ((d & SEM_VALUE_MASK) == 0)
 291         {
>292           err = do_futex_wait (sem, abstime);


This looks like this is a logging thread (definitely not the main one,
which was detailed in [comment 4], and which also matches the PID
encoded in IPC shared memory files previewed there as well), with
the (missing for some reason) bottom-most frame:

-- X: libqb-1.0/lib/log_thread.c:71 or 254

 58 static void *
 59 qb_logt_worker_thread(void *data)
 60 {
 61         struct qb_log_record *rec;
 62         int dropped = 0;
 63         int res;
 64 
 65         /*
 66          * Signal wthread_create that the initialization process may continue
 67          */
 68         sem_post(&logt_thread_start);
 69         for (;;) {
 70 retry_sem_wait:
>71                 res = sem_wait(&logt_print_finished);

or

 238 void
 239 qb_log_thread_stop(void)
 240 {
 241         int res;
 242         int value;
 243         struct qb_log_record *rec;
 244 
 245         if (wthread_active == QB_FALSE && logt_wthread_lock == NULL) {
 246                 return;
 247         }
 248         if (wthread_active == QB_FALSE) {
 249                 for (;;) {
 250                         res = sem_getvalue(&logt_print_finished, &value);
 251                         if (res != 0 || value == 0) {
 252                                 break;
 253                         }
>254                         sem_wait(&logt_print_finished);


Note that blackbox logging is always synchronous/part of the main control
flow (i.e. not deferred to logging thread), which limits the problem space
somewhat (good).

(
On the other hand, libqb currently seems very fragile as far threaded
logging vs. in-main-thread configuration changes possibly imposing
reloads goes: https://github.com/ClusterLabs/libqb/issues/268
)

Comment 6 Jan Friesse 2017-08-25 06:33:43 UTC

It is logging thread for sure, because corosync does not have any other threads. There are simply just two of them. Main thread and logging thread.

Maybe we should consider to disable this thread in corosync. Main reason to enable this functionality in corosync was concern about blocked logging but AFAIK pacemaker does not use logging thread and I'm not aware of any complains related to blocked logging.

Comment 7 Jan Pokorný [poki] 2017-08-25 14:44:53 UTC

For whenever this issue is reproduced and it was (against all
recommendations) no-power-fencing-configured case, hence /dev/shm
content is preserved:

# ls -l /dev/shm/qb-corosync-*-blackbox-*
-rw-------. 1 root root 8392704 Aug 25 10:42 /dev/shm/qb-corosync-8671-blackbox-data
-rw-------. 1 root root    8248 Aug 25 10:42 /dev/shm/qb-corosync-8671-blackbox-header

Have a look whether the timestamps match or not.  I've observed unrelated
cases these were different, and I am not sure if the event getting the two
out of sync cannot be related to reproducing the bug (data file being newer
than header file).

Comment 9 Jan Friesse 2017-08-28 06:16:18 UTC

How is this issue reproducible? Would it make sense to make scratch build of corosync without activate threaded logging? Is customer willing to try it?

Comment 10 Stan Moravec 2017-08-28 10:17:45 UTC

(In reply to Jan Friesse from comment #9)
> How is this issue reproducible? Would it make sense to make scratch build of
> corosync without activate threaded logging? Is customer willing to try it?

it's production env so unlikely to be receptive to any testing (but we asked).
It happened once only, so effectiveness of the test is questionable too.

Comment 11 Jan Friesse 2017-08-28 11:21:50 UTC

(In reply to Stan Moravec from comment #10)
> (In reply to Jan Friesse from comment #9)
> > How is this issue reproducible? Would it make sense to make scratch build of
> > corosync without activate threaded logging? Is customer willing to try it?
> 
> it's production env so unlikely to be receptive to any testing (but we
> asked).
> It happened once only, so effectiveness of the test is questionable too.

Ok, thanks for info. I got the feeling from BZ that we have at least 2 cores so it is probably reproducible. But if it happened only once for customer then it's probably not very effective way.

Comment 12 Jan Pokorný [poki] 2017-08-28 21:52:02 UTC

re [comment 11]: 2 independent (I believe) reports, only one
contained the core dump.  It's actually not expected to have
core dumps available even when there are reports, simply because
how fencing works against the goal of having the state as of the
crash captured.

---

Looking further at the registers captured in the core dump, I can
deduce that

1. rb->shared_hdr->read_pt at the point for crash was 0
   (equally, the divident in the crashed division was 1 [edx:eax])

   - this makes it more likely that some corner case might
     have been assisting the reproducibility, e.g.:
     . for some reason the blackbox logging was started anew
       (I see no apparent way to trigger)
     . there was some kind of integer wraparound back to 0
       (again, not sure if such situation reachable at all)

2. timestamp of the message being logged: 1502206258 [r14]
   (which matches the timestamp of the sosreport)

(
3. size of the chunk to be allocated in qb_rb_chunk_alloc
   is 559 (571 [r12] - 12, QB_RB_CHUNK_MARGIN = 3 * 4 = 12),
   i.e., the actual size of the structured data (w/o message)
   to be logged is 559 - 512 = 47 [r13] -- although I cannot
   reach 22 bytes incl. null-terminator, per [comment 4, --6]
   for the function being logged from (but that value is present
   as r15)
)

Comment 13 Jan Pokorný [poki] 2017-08-28 22:02:36 UTC

re [comment 12]:

> 1. rb->shared_hdr->read_pt at the point for crash was 0
>    (equally, the divident in the crashed division was 1 [edx:eax])

Actually and foremost, it makes the hypothesis that the memory page
(at least the first one out of usually 3 with page size of 4096 B)
was plain zeroed.  Again, not sure which conditions (incl. possibly
a bug in kernel or lucky segfault-free inner memory corruption amongst
others) could lead to that state (hopefully not a deliberate
/dev/shm/qb-corosync-*-blackbox-header modification).

Comment 14 Stan Moravec 2017-08-30 08:50:42 UTC

FWIW, customer offered they can possibly run single threaded corosync 
on their test env (where they did not see the problem on, however). 

As I understand your notes it may well be an unrelated memory corruption 
(one would expect more things to be affected, but the system went down so
maybe they just were not observed yet) as well or some sort of rare race that
resulted in process memory to be unexpectedly manipulated/zeroed out and we see just unrelated consequences (for which the singleT corosync may still be applicable).

Comment 16 Jan Pokorný [poki] 2017-12-20 21:40:47 UTC

re [comment 4]:

-- 10: libqb-1.0/lib/ringbuffer.c:454
 449         /*
 450          * Reclaim data if we are over writing and we need space
 451          */
 452         if (rb->flags & QB_RB_FLAG_OVERWRITE) {
 453                 while (qb_rb_space_free(rb) < (len + QB_RB_CHUNK_MARGIN)) {
>454                        int rc = _rb_chunk_reclaim(rb);
 455                         if (rc != 0) {
 456                                 errno = rc;
 457                                 return NULL;
 458                         }
 459                 }


Coincidentally, based on recent review of compiler warnings, I sense
the code smell of

> while (qb_rb_space_free(rb) < (len + QB_RB_CHUNK_MARGIN))

where ssize_t (also for error signalling) is being compared to size_t,
which feasibly (I believe) might be the mysterious culprit here.

Comment 18 Jan Pokorný [poki] 2018-01-31 09:18:22 UTC

This was allegedly caused by interaction between corosync and libqb
(and indirectly made more probable because of how pcs keeps reprobing
the managed systems).

Hence closing as a duplicate of [bug 1536219] (reassigned hence to
corosync), with a fix on libqb side being a new provision to allow
handling after-fork cases gracefully [bug 1539936] and pcs side
fix being a mitigation of running external executables with elevated
privileges needlessly [bug 1537019].

*** This bug has been marked as a duplicate of bug 1536219 ***