RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 950403 - Pengine assert in qb_log_from_external_source()
Summary: Pengine assert in qb_log_from_external_source()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libqb
Version: 6.4
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: rc
: ---
Assignee: David Vossel
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 987355 1001491
TreeView+ depends on / blocked
 
Reported: 2013-04-10 08:17 UTC by Taneli Leppä
Modified: 2013-11-21 11:53 UTC (History)
9 users (show)

Fixed In Version: libqb-0.16.0-2.el6
Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-21 11:53:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1634 0 normal SHIPPED_LIVE libqb bug fix and enhancement update 2013-11-20 21:53:46 UTC

Description Taneli Leppä 2013-04-10 08:17:53 UTC
Description of problem:

I had a crash last night in pengine:

Apr 10 03:01:20 clu4 crmd[4057]:    error: crm_ipc_read: Connection to pengine failed
Apr 10 03:01:20 clu4 pacemakerd[4043]:   notice: pcmk_child_exit: Child process pengine terminated with signal 6 (pid=4056, core=128)
Apr 10 03:01:20 clu4 pacemakerd[4043]:   notice: pcmk_process_exit: Respawning failed child process: pengine
Apr 10 03:01:20 clu4 crmd[4057]:    error: mainloop_gio_callback: Connection to pengine[0x17bcff0] closed (I/O condition=25)

The following core dump and backtrace was deposited in pacemaker/cores directory:


#0  0x0000003e2a0328a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003e2a034085 in abort () at abort.c:92
#2  0x0000003e2a02ba1e in __assert_fail_base (fmt=<value optimized out>, assertion=0x3e30819fbb "rc == 0", file=0x3e30819fb1 "log_dcs.c", line=<value optimized out>,
    function=<value optimized out>) at assert.c:96
#3  0x0000003e2a02bae0 in __assert_fail (assertion=0x3e30819fbb "rc == 0", file=0x3e30819fb1 "log_dcs.c", line=70, function=0x3e3081a030 "_log_dcs_new_cs") at assert.c:105
#4  0x0000003e308143eb in _log_dcs_new_cs (function=0x3e32c46310 "native_color", filename=0x3e32c442bf "native.c", format=0x3e32428d00 "%s: %s allocation score on %s: %s",
    priority=<value optimized out>, lineno=472, tags=0) at log_dcs.c:70
#5  0x0000003e308145e5 in qb_log_dcs_get (newly_created=0x7fff43a5715c, function=<value optimized out>, filename=<value optimized out>, format=<value optimized out>,
    priority=8 '\b', lineno=<value optimized out>, tags=0) at log_dcs.c:146
#6  0x0000003e30812ba9 in qb_log_callsite_get (function=<value optimized out>, filename=<value optimized out>, format=<value optimized out>, priority=<value optimized out>,
    lineno=<value optimized out>, tags=0) at log.c:256
#7  0x0000003e308130ab in qb_log_from_external_source (function=<value optimized out>, filename=<value optimized out>, format=<value optimized out>,
    priority=<value optimized out>, lineno=<value optimized out>, tags=<value optimized out>) at log.c:331
#8  0x0000003e32417315 in dump_node_scores_worker (level=9, file=0x3e32c442bf "native.c", function=0x3e32c46310 "native_color", line=472, rsc=0xa03a40,
    comment=0x3e32c44548 "Post-coloc", nodes=0xc17800) at utils.c:189
#9  0x0000003e32c21135 in native_color (rsc=0xa03a40, prefer=0xc492c0, data_set=0x7fff43a57710) at native.c:472
#10 0x0000003e32c314fc in color_instance (rsc=0xa03a40, prefer=0xc492c0, all_coloc=<value optimized out>, data_set=0x7fff43a57710) at clone.c:430
#11 0x0000003e32c35549 in clone_color (rsc=0xb5c960, prefer=<value optimized out>, data_set=0x7fff43a57710) at clone.c:578
#12 0x0000003e32c21000 in native_color (rsc=0xd77150, prefer=0x0, data_set=0x7fff43a57710) at native.c:459
#13 0x0000003e32c12c0f in stage5 (data_set=0x7fff43a57710) at allocate.c:1130
#14 0x0000003e32c09b1d in do_calculations (data_set=0x7fff43a57710, xml_input=<value optimized out>, now=<value optimized out>) at pengine.c:247
#15 0x0000003e32c0a702 in process_pe_message (msg=0x115c8a0, xml_data=0x1106890, sender=0x9910c0) at pengine.c:126
#16 0x00000000004012be in pe_ipc_dispatch (c=0x9910c0, data=<value optimized out>, size=<value optimized out>) at main.c:74
#17 0x0000003e3080ebb4 in _process_request_ (c=0x9910c0, ms_timeout=10) at ipcs.c:647
#18 0x0000003e3080ef04 in qb_ipcs_dispatch_connection_request (fd=<value optimized out>, revents=<value optimized out>, data=0x9910c0) at ipcs.c:755
#19 0x0000003e31425240 in gio_read_socket (gio=<value optimized out>, condition=G_IO_IN, data=0x990a80) at mainloop.c:372
#20 0x0000003e2bc38f0e in g_main_dispatch (context=0x98e760) at gmain.c:1960
#21 IA__g_main_context_dispatch (context=0x98e760) at gmain.c:2513
#22 0x0000003e2bc3c938 in g_main_context_iterate (context=0x98e760, block=1, dispatch=1, self=<value optimized out>) at gmain.c:2591
#23 0x0000003e2bc3cd55 in IA__g_main_loop_run (loop=0x98cdb0) at gmain.c:2799
#24 0x00000000004014c8 in main (argc=1, argv=0x7fff43a57cf8) at main.c:174

This is apparently a bug in libqb and it's discussed here:

http://comments.gmane.org/gmane.linux.highavailability.pacemaker/15504

A patch for libqb 0.14.4 is available at:

https://github.com/asalkeld/libqb/commit/30a7871646c1f5bbb602e0a01f5550a4516b36f8

But that does not apply cleanly to 0.14.2 (which Red Hat ships).


How reproducible:
Not reproducible

Steps to Reproduce:
1. Use Pacemaker
  
Actual results:
Pengine crashes.

Expected results:
Pengine doesn't crash.

Comment 1 Andrew Beekhof 2013-04-11 22:54:39 UTC
Yep, thats a libqb bug. Reassigning.

Comment 2 Taneli Leppä 2013-04-22 14:06:20 UTC
Got hit by another one of these crashes last night.

Comment 5 Andrew Beekhof 2013-06-03 01:58:13 UTC
I think libqb will need a rebase for libqb in 6.5

Until then, you can borrow an updated rpm from:
   http://clusterlabs.org/rpm-test-next/rhel-6/


Also, for QE, the reproducer is time.

libqb is creating duplicates and eventually uses up all the free memory until the pengine process crashes.

Comment 6 Taneli Leppä 2013-06-05 06:35:12 UTC
Is the updated RPM libqb-0.14.4-7.38.07c9.dirty.el6.x86_64.rpm? Seems kind of old.

Comment 7 Andrew Beekhof 2013-06-07 07:00:01 UTC
Its new enough to have the fix

Comment 16 Andrew Beekhof 2013-07-26 01:52:38 UTC
Dropping TechPreview keyword due to bug #987355

Comment 20 errata-xmlrpc 2013-11-21 11:53:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1634.html


Note You need to log in before you can comment on or make changes to this bug.