Red Hat Bugzilla – Bug 950403
Pengine assert in qb_log_from_external_source()
Last modified: 2013-11-21 06:53:03 EST
Description of problem: I had a crash last night in pengine: Apr 10 03:01:20 clu4 crmd[4057]: error: crm_ipc_read: Connection to pengine failed Apr 10 03:01:20 clu4 pacemakerd[4043]: notice: pcmk_child_exit: Child process pengine terminated with signal 6 (pid=4056, core=128) Apr 10 03:01:20 clu4 pacemakerd[4043]: notice: pcmk_process_exit: Respawning failed child process: pengine Apr 10 03:01:20 clu4 crmd[4057]: error: mainloop_gio_callback: Connection to pengine[0x17bcff0] closed (I/O condition=25) The following core dump and backtrace was deposited in pacemaker/cores directory: #0 0x0000003e2a0328a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003e2a034085 in abort () at abort.c:92 #2 0x0000003e2a02ba1e in __assert_fail_base (fmt=<value optimized out>, assertion=0x3e30819fbb "rc == 0", file=0x3e30819fb1 "log_dcs.c", line=<value optimized out>, function=<value optimized out>) at assert.c:96 #3 0x0000003e2a02bae0 in __assert_fail (assertion=0x3e30819fbb "rc == 0", file=0x3e30819fb1 "log_dcs.c", line=70, function=0x3e3081a030 "_log_dcs_new_cs") at assert.c:105 #4 0x0000003e308143eb in _log_dcs_new_cs (function=0x3e32c46310 "native_color", filename=0x3e32c442bf "native.c", format=0x3e32428d00 "%s: %s allocation score on %s: %s", priority=<value optimized out>, lineno=472, tags=0) at log_dcs.c:70 #5 0x0000003e308145e5 in qb_log_dcs_get (newly_created=0x7fff43a5715c, function=<value optimized out>, filename=<value optimized out>, format=<value optimized out>, priority=8 '\b', lineno=<value optimized out>, tags=0) at log_dcs.c:146 #6 0x0000003e30812ba9 in qb_log_callsite_get (function=<value optimized out>, filename=<value optimized out>, format=<value optimized out>, priority=<value optimized out>, lineno=<value optimized out>, tags=0) at log.c:256 #7 0x0000003e308130ab in qb_log_from_external_source (function=<value optimized out>, filename=<value optimized out>, format=<value optimized out>, priority=<value optimized out>, lineno=<value optimized out>, tags=<value optimized out>) at log.c:331 #8 0x0000003e32417315 in dump_node_scores_worker (level=9, file=0x3e32c442bf "native.c", function=0x3e32c46310 "native_color", line=472, rsc=0xa03a40, comment=0x3e32c44548 "Post-coloc", nodes=0xc17800) at utils.c:189 #9 0x0000003e32c21135 in native_color (rsc=0xa03a40, prefer=0xc492c0, data_set=0x7fff43a57710) at native.c:472 #10 0x0000003e32c314fc in color_instance (rsc=0xa03a40, prefer=0xc492c0, all_coloc=<value optimized out>, data_set=0x7fff43a57710) at clone.c:430 #11 0x0000003e32c35549 in clone_color (rsc=0xb5c960, prefer=<value optimized out>, data_set=0x7fff43a57710) at clone.c:578 #12 0x0000003e32c21000 in native_color (rsc=0xd77150, prefer=0x0, data_set=0x7fff43a57710) at native.c:459 #13 0x0000003e32c12c0f in stage5 (data_set=0x7fff43a57710) at allocate.c:1130 #14 0x0000003e32c09b1d in do_calculations (data_set=0x7fff43a57710, xml_input=<value optimized out>, now=<value optimized out>) at pengine.c:247 #15 0x0000003e32c0a702 in process_pe_message (msg=0x115c8a0, xml_data=0x1106890, sender=0x9910c0) at pengine.c:126 #16 0x00000000004012be in pe_ipc_dispatch (c=0x9910c0, data=<value optimized out>, size=<value optimized out>) at main.c:74 #17 0x0000003e3080ebb4 in _process_request_ (c=0x9910c0, ms_timeout=10) at ipcs.c:647 #18 0x0000003e3080ef04 in qb_ipcs_dispatch_connection_request (fd=<value optimized out>, revents=<value optimized out>, data=0x9910c0) at ipcs.c:755 #19 0x0000003e31425240 in gio_read_socket (gio=<value optimized out>, condition=G_IO_IN, data=0x990a80) at mainloop.c:372 #20 0x0000003e2bc38f0e in g_main_dispatch (context=0x98e760) at gmain.c:1960 #21 IA__g_main_context_dispatch (context=0x98e760) at gmain.c:2513 #22 0x0000003e2bc3c938 in g_main_context_iterate (context=0x98e760, block=1, dispatch=1, self=<value optimized out>) at gmain.c:2591 #23 0x0000003e2bc3cd55 in IA__g_main_loop_run (loop=0x98cdb0) at gmain.c:2799 #24 0x00000000004014c8 in main (argc=1, argv=0x7fff43a57cf8) at main.c:174 This is apparently a bug in libqb and it's discussed here: http://comments.gmane.org/gmane.linux.highavailability.pacemaker/15504 A patch for libqb 0.14.4 is available at: https://github.com/asalkeld/libqb/commit/30a7871646c1f5bbb602e0a01f5550a4516b36f8 But that does not apply cleanly to 0.14.2 (which Red Hat ships). How reproducible: Not reproducible Steps to Reproduce: 1. Use Pacemaker Actual results: Pengine crashes. Expected results: Pengine doesn't crash.
Yep, thats a libqb bug. Reassigning.
Got hit by another one of these crashes last night.
I think libqb will need a rebase for libqb in 6.5 Until then, you can borrow an updated rpm from: http://clusterlabs.org/rpm-test-next/rhel-6/ Also, for QE, the reproducer is time. libqb is creating duplicates and eventually uses up all the free memory until the pengine process crashes.
Is the updated RPM libqb-0.14.4-7.38.07c9.dirty.el6.x86_64.rpm? Seems kind of old.
Its new enough to have the fix
Dropping TechPreview keyword due to bug #987355
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1634.html