Bug 2168249

Summary: SIGABRT crash in pacemaker-fenced: (crm_glib_handler) crit: GLib: Source ID 192 was not found when attempting to remove it
Product: Red Hat Enterprise Linux 8 Reporter: Ken Gaillot <kgaillot>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: urgent    
Version: 8.7CC: cluster-maint, cluster-qe, msmazova, nwahl, phagara
Target Milestone: rcKeywords: Triaged
Target Release: 8.8   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: pacemaker-2.1.5-6.el8 Doc Type: Bug Fix
Doc Text:
Cause: Pacemaker did not zero out a timer after releasing it. Consequence: The fencer could crash if a fencing request was in flight at shutdown. Fix: The timer is now zeroed out after being freed. Result: Shutdown proceeds cleanly.
Story Points: ---
Clone Of: 2166967 Environment:
Last Closed: 2023-05-16 08:35:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 2.1.6
Embargoed:
Bug Depends On: 2166967    
Bug Blocks:    

Description Ken Gaillot 2023-02-08 15:08:43 UTC
+++ This bug was initially created as a clone of Bug #2166967 +++

Description of problem:


> # gdb /usr/libexec/pacemaker/pacemaker-fenced core.pacemaker-fence.0.ecf9cd80eb44451184e5e0004db002c2.58712.1675430955000000
> Reading symbols from /usr/libexec/pacemaker/pacemaker-fenced...
> Reading symbols from /usr/lib/debug/usr/libexec/pacemaker/pacemaker-fenced-2.1.5-5.el9.x86_64.debug...
> 
> warning: Can't open file /dev/shm/qb-57704-57725-34-1SCL7r/qb-event-cpg-data during file-backed mapping note processing
> 
> warning: Can't open file /dev/shm/qb-57704-57725-34-1SCL7r/qb-event-cpg-header during file-backed mapping note processing
> 
> warning: Can't open file /dev/shm/qb-57704-57725-34-1SCL7r/qb-response-cpg-data during file-backed mapping note processing
> 
> warning: Can't open file /dev/shm/qb-57704-57725-34-1SCL7r/qb-request-cpg-data during file-backed mapping note processing
> 
> warning: Can't open file /dev/shm/qb-57704-57725-34-1SCL7r/qb-response-cpg-header during file-backed mapping note processing
> 
> warning: Can't open file /dev/shm/qb-57704-57725-34-1SCL7r/qb-request-cpg-header during file-backed mapping note processing
> [New LWP 58712]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `/usr/libexec/pacemaker/pacemaker-fenced'.
> Program terminated with signal SIGABRT, Aborted.
> #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
> 44	      return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
> (gdb) bt
> #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
> #1  0x00007fe7e56a15b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
> #2  0x00007fe7e5654d46 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
> #3  0x00007fe7e56287f3 in __GI_abort () at abort.c:79
> #4  0x00007fe7e5b08289 in fail_assert_as (assert_condition=<optimized out>, line=<optimized out>, function=<optimized out>, file=<optimized out>) at /usr/src/debug/pacemaker-2.1.5-5.el9.x86_64/lib/common/utils.c:371
> #5  crm_abort (file=<optimized out>, function=<optimized out>, line=<optimized out>, assert_condition=<optimized out>, do_core=<optimized out>, do_fork=<optimized out>) at /usr/src/debug/pacemaker-2.1.5-5.el9.x86_64/lib/common/utils.c:403
> #6  0x00007fe7e5b1dad8 in crm_glib_handler (log_domain=log_domain@entry=0x7fe7e58c4071 "GLib", flags=flags@entry=G_LOG_LEVEL_CRITICAL, message=message@entry=0x564470af8ae0 "Source ID 190 was not found when attempting to remove it", user_data=user_data@entry=0x0)
>     at /usr/src/debug/pacemaker-2.1.5-5.el9.x86_64/lib/common/logging.c:74
> #7  0x00007fe7e587157a in g_logv (log_domain=0x7fe7e58c4071 "GLib", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=<optimized out>) at ../glib/gmessages.c:1382
> #8  0x00007fe7e5871863 in g_log (log_domain=<optimized out>, log_level=<optimized out>, format=<optimized out>) at ../glib/gmessages.c:1451
> #9  0x00007fe7e586b025 in g_source_remove (tag=190) at ../glib/gmain.c:2499
> #10 0x0000564467c9fd70 in clear_remote_op_timers (op=0x564470af49e0) at /usr/src/debug/pacemaker-2.1.5-5.el9.x86_64/daemons/fenced/fenced_remote.c:232
> #11 free_remote_op (data=0x564470af49e0) at /usr/src/debug/pacemaker-2.1.5-5.el9.x86_64/daemons/fenced/fenced_remote.c:248
> #12 0x00007fe7e58563a2 in g_hash_table_remove_all_nodes (hash_table=0x564469aea180, notify=<optimized out>, destruction=<optimized out>) at ../glib/ghash.c:706
> #13 0x00007fe7e5857023 in g_hash_table_remove_all_nodes (destruction=0, notify=1, hash_table=0x564469aea180) at ../glib/ghash.c:628
> #14 g_hash_table_remove_all (hash_table=0x564469aea180) at ../glib/ghash.c:1883
> #15 0x00007fe7e585b0d2 in g_hash_table_destroy (hash_table=0x564469aea180) at ../glib/ghash.c:1486
> #16 0x0000564467c92c8f in free_stonith_remote_op_list () at /usr/src/debug/pacemaker-2.1.5-5.el9.x86_64/daemons/fenced/fenced_remote.c:113
> #17 stonith_cleanup () at /usr/src/debug/pacemaker-2.1.5-5.el9.x86_64/daemons/fenced/pacemaker-fenced.c:1267
> #18 0x0000564467c90ec1 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/pacemaker-2.1.5-5.el9.x86_64/daemons/fenced/pacemaker-fenced.c:1733

Version-Release number of selected component (if applicable):
pacemaker-2.1.5-5.el9.x86_64

How reproducible:
rarely


--- Additional comment from Ken Gaillot on 2023-02-06 16:52:45 UTC ---

Fixed in upstream main branch by commit db109219

It may be reproducible by having a fencing request pending when the cluster is shut down on the node executing the fencing, but it may not be reliable. Sanity checking should be sufficient if it cannot be reproduced, as the fix is a one-liner easily verified code-wise.

Comment 6 Ken Gaillot 2023-02-20 17:16:28 UTC
The fix in the Description was incomplete, and is completed by commit 0291db47 in the upstream main branch. A new build will be forthcoming.

Comment 9 Markéta Smazová 2023-02-27 19:33:01 UTC
verified as SanityOnly in pacemaker-2.1.5-8.el8

Comment 11 errata-xmlrpc 2023-05-16 08:35:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2818