Bug 2039675

Summary: pacemaker coredump with ocf:heartbeat:mysql resource
Product: Red Hat Enterprise Linux 8 Reporter: Martin Juricek <mjuricek>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: urgent    
Version: 8.6CC: cluster-maint, msmazova, nwahl, sbradley
Target Milestone: rcKeywords: Regression, Triaged
Target Release: 8.6Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: pacemaker-2.1.2-3.el8 Doc Type: No Doc Update
Doc Text:
This bug is in an intermediate build not yet released in RHEL
Story Points: ---
Clone Of:
: 2040443 (view as bug list) Environment:
Last Closed: 2022-05-10 14:10:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2040443    

Comment 2 Reid Wahl 2022-01-12 11:21:39 UTC
Notes in case I don't make it back to this:

Issue seems to disappear if I remove notify=true.

I didn't set up mysql, so my resource fails differently, but valgrind shows a double free of xml.

==71171== Invalid free() / delete / delete[] / realloc()
==71171==    at 0x4C3610C: free (vg_replace_malloc.c:538)
==71171==    by 0x84AD447: xmlFreeNodeList__internal_alias (tree.c:3703)
==71171==    by 0x84AD447: xmlFreeNodeList (tree.c:3665)
==71171==    by 0x84AD52A: xmlFreeProp (tree.c:2097)
==71171==    by 0x84AD6A8: xmlFreePropList__internal_alias (tree.c:2072)
==71171==    by 0x84AD6A8: xmlFreePropList (tree.c:2067)
==71171==    by 0x84AD36A: xmlFreeNodeList__internal_alias (tree.c:3697)
==71171==    by 0x84AD36A: xmlFreeNodeList (tree.c:3665)
==71171==    by 0x84AD119: xmlFreeDoc (tree.c:1253)
==71171==    by 0x5DC770A: free_xml_with_position (xml.c:775)
==71171==    by 0x574CAA2: lrmd_ipc_dispatch (lrmd_client.c:341)
==71171==    by 0x574CAA2: lrmd_ipc_dispatch (lrmd_client.c:332)
==71171==    by 0x5DA7753: mainloop_gio_callback (mainloop.c:770)
==71171==    by 0x818A95C: g_main_dispatch (gmain.c:3193)
==71171==    by 0x818A95C: g_main_context_dispatch (gmain.c:3873)
==71171==    by 0x818AD17: g_main_context_iterate.isra.21 (gmain.c:3946)
==71171==    by 0x818B041: g_main_loop_run (gmain.c:4142)
==71171==    by 0x110DE2: crmd_init (pacemaker-controld.c:160)
==71171==    by 0x110BFD: main (pacemaker-controld.c:130)
==71171==  Address 0x12446e50 is 0 bytes inside a block of size 56 free'd
==71171==    at 0x4C3610C: free (vg_replace_malloc.c:538)
==71171==    by 0x57506E0: lrmd__set_result (lrmd_client.c:2374)
==71171==    by 0x50AEC61: pcmk__create_history_xml (pcmk_sched_utils.c:616)
==71171==    by 0x119DF9: build_operation_update.part.13 (controld_execd.c:697)
==71171==    by 0x11C012: build_operation_update (controld_execd.c:2127)
==71171==    by 0x11C012: controld_ack_event_directly (controld_execd.c:2126)
==71171==    by 0x11D3CC: process_lrm_event (controld_execd.c:2877)
==71171==    by 0x574B524: lrmd_dispatch_internal (lrmd_client.c:323)
==71171==    by 0x574CA9A: lrmd_ipc_dispatch (lrmd_client.c:340)
==71171==    by 0x574CA9A: lrmd_ipc_dispatch (lrmd_client.c:332)
==71171==    by 0x5DA7753: mainloop_gio_callback (mainloop.c:770)
==71171==    by 0x818A95C: g_main_dispatch (gmain.c:3193)
==71171==    by 0x818A95C: g_main_context_dispatch (gmain.c:3873)
==71171==    by 0x818AD17: g_main_context_iterate.isra.21 (gmain.c:3946)
==71171==    by 0x818B041: g_main_loop_run (gmain.c:4142)
==71171==    by 0x110DE2: crmd_init (pacemaker-controld.c:160)
==71171==    by 0x110BFD: main (pacemaker-controld.c:130)
==71171==  Block was alloc'd at
==71171==    at 0x4C34F0B: malloc (vg_replace_malloc.c:307)
==71171==    by 0x850495B: xmlBufResize (buf.c:827)
==71171==    by 0x850495B: xmlBufResize (buf.c:745)
==71171==    by 0x8504A1B: xmlBufAdd (buf.c:902)
==71171==    by 0x8504A1B: xmlBufAdd (buf.c:868)
==71171==    by 0x84B07D5: xmlStringLenGetNodeList (tree.c:1452)
==71171==    by 0x85562AB: xmlSAX2AttributeNs (SAX2.c:2051)
==71171==    by 0x8556995: xmlSAX2StartElementNs (SAX2.c:2416)
==71171==    by 0x849FA0D: xmlParseStartTag2 (parser.c:9637)
==71171==    by 0x84A370E: xmlParseElement (parser.c:9982)
==71171==    by 0x84A3D29: xmlParseDocument (parser.c:10765)
==71171==    by 0x84A4136: xmlDoRead (parser.c:15237)
==71171==    by 0x5DCABEA: string2xml (xml.c:887)
==71171==    by 0x574CA8C: lrmd_ipc_dispatch (lrmd_client.c:338)
==71171==    by 0x574CA8C: lrmd_ipc_dispatch (lrmd_client.c:332)
==71171==    by 0x5DA7753: mainloop_gio_callback (mainloop.c:770)
==71171==    by 0x818A95C: g_main_dispatch (gmain.c:3193)
==71171==    by 0x818A95C: g_main_context_dispatch (gmain.c:3873)
==71171==    by 0x818AD17: g_main_context_iterate.isra.21 (gmain.c:3946)
==71171==    by 0x818B041: g_main_loop_run (gmain.c:4142)
==71171==    by 0x110DE2: crmd_init (pacemaker-controld.c:160)
==71171==    by 0x110BFD: main (pacemaker-controld.c:130)
==71171== 
{
   <insert_a_suppression_name_here>
   Memcheck:Free
   fun:free
   fun:xmlFreeNodeList__internal_alias
   fun:xmlFreeNodeList
   fun:xmlFreeProp
   fun:xmlFreePropList__internal_alias
   fun:xmlFreePropList
   fun:xmlFreeNodeList__internal_alias
   fun:xmlFreeNodeList
   fun:xmlFreeDoc
   fun:free_xml_with_position
   fun:lrmd_ipc_dispatch
   fun:lrmd_ipc_dispatch
   fun:mainloop_gio_callback
   fun:g_main_dispatch
   fun:g_main_context_dispatch
   fun:g_main_context_iterate.isra.21
   fun:g_main_loop_run
   fun:crmd_init
   fun:main
}
==71171== 
==71171== HEAP SUMMARY:
==71171==     in use at exit: 23,621 bytes in 70 blocks
==71171==   total heap usage: 211,342 allocs, 211,276 frees, 43,953,549 bytes allocated
==71171== 
==71171== LEAK SUMMARY:
==71171==    definitely lost: 0 bytes in 0 blocks
==71171==    indirectly lost: 0 bytes in 0 blocks
==71171==      possibly lost: 0 bytes in 0 blocks
==71171==    still reachable: 23,069 bytes in 62 blocks
==71171==         suppressed: 552 bytes in 8 blocks
==71171== Reachable blocks (those to which a pointer was found) are not shown.
==71171== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==71171== 
==71171== For lists of detected and suppressed errors, rerun with: -s
==71171== ERROR SUMMARY: 4 errors from 1 contexts (suppressed: 0 from 0)

Comment 3 Reid Wahl 2022-01-13 02:45:15 UTC
The coredumps start with https://github.com/ClusterLabs/pacemaker/pull/2506/commits/31c7fa8a.

Comment 4 Reid Wahl 2022-01-13 03:27:52 UTC
To sum up:
  - An event string gets converted to an XML node in lrmd_ipc_dispatch.
  - lrmd_dispatch_internal uses the value of the XML node's F_LRMD_RSC_EXIT_REASON attribute directly when building an lrmd_event_data_t object. The string is not copied.
  - Via a callback, this event makes its way down to pcmk__create_history_xml (call trace in comment 2). 
  - pcmk__create_history_xml frees the exit-reason for a notify operation as part of its way of ignoring the result.

    if (pcmk__str_eq(task, CRMD_ACTION_NOTIFY, pcmk__str_none)) {
        const char *n_type = crm_meta_value(op->params, "notify_type");
        const char *n_task = crm_meta_value(op->params, "notify_operation");

        CRM_LOG_ASSERT(n_type != NULL);
        CRM_LOG_ASSERT(n_task != NULL);
        op_id = pcmk__notify_key(op->rsc_id, n_type, n_task);

        if (op->op_status != PCMK_EXEC_PENDING) {
            /* Ignore notify errors.
             *
             * @TODO It might be better to keep the correct result here, and
             * ignore it in process_graph_event().
             */
            lrmd__set_result(op, 0, PCMK_EXEC_DONE, NULL);
        }

  - lrmd_ipc_dispatch later frees the whole XML node that it created. That XML node contains the exit-reason string that's already been freed.

Comment 7 Ken Gaillot 2022-01-13 17:19:48 UTC
Fixed upstream as of commit ed8b2c86a thanks to nwahl

Comment 14 errata-xmlrpc 2022-05-10 14:10:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1885