Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1133952

Summary: [abrt] pacemaker: crm_abort(): stonithd killed by SIGABRT
Product: Red Hat Enterprise Linux 7 Reporter: Nate Straz <nstraz>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.0CC: abeekhof, cluster-maint, cluster-qe, dvossel, jkortus, mnovacek
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:81e7c231edd24298e0f49715a05cee428242e23c
Fixed In Version: pacemaker-1.1.12-4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 10:00:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: backtrace
none
File: cgroup
none
File: core_backtrace
none
File: environ
none
File: limits
none
File: maps
none
File: open_fds
none
File: proc_pid_status
none
File: sosreport.log
none
File: tmpIFHSo2 none

Description Nate Straz 2014-08-26 14:06:43 UTC
Description of problem:


Version-Release number of selected component:
pacemaker-1.1.10-32.el7_0

Additional info:
reporter:       libreport-2.1.11
backtrace_rating: 4
cmdline:        /usr/libexec/pacemaker/stonithd
crash_function: crm_abort
executable:     /usr/libexec/pacemaker/stonithd
kernel:         3.10.0-131.el7.x86_64
runlevel:       N 3
tmpBQyGN7:      2014-08-26 08:48:58,453 INFO: could not run 'tree /var/lib': command not found
type:           CCpp
uid:            0

Truncated backtrace:
Thread no. 1 (7 frames)
 #2 crm_abort at /lib64/libcrmcommon.so.3
 #3 crm_glib_handler at /lib64/libcrmcommon.so.3
 #6 g_source_remove at gmain.c:2194
 #7 stonith_action_clear_tracking_data at /lib64/libstonithd.so.2
 #8 stonith_action_destroy at /lib64/libstonithd.so.2
 #9 child_death_dispatch at /lib64/libcrmcommon.so.3
 #10 crm_signal_dispatch at /lib64/libcrmcommon.so.3

Comment 1 Nate Straz 2014-08-26 14:06:45 UTC
Created attachment 931005 [details]
File: backtrace

Comment 2 Nate Straz 2014-08-26 14:06:45 UTC
Created attachment 931006 [details]
File: cgroup

Comment 3 Nate Straz 2014-08-26 14:06:46 UTC
Created attachment 931007 [details]
File: core_backtrace

Comment 4 Nate Straz 2014-08-26 14:06:47 UTC
Created attachment 931008 [details]
File: environ

Comment 5 Nate Straz 2014-08-26 14:06:48 UTC
Created attachment 931009 [details]
File: limits

Comment 6 Nate Straz 2014-08-26 14:06:49 UTC
Created attachment 931010 [details]
File: maps

Comment 7 Nate Straz 2014-08-26 14:06:50 UTC
Created attachment 931011 [details]
File: open_fds

Comment 8 Nate Straz 2014-08-26 14:06:50 UTC
Created attachment 931012 [details]
File: proc_pid_status

Comment 9 Nate Straz 2014-08-26 14:06:51 UTC
Created attachment 931013 [details]
File: sosreport.log

Comment 10 Nate Straz 2014-08-26 14:06:52 UTC
Created attachment 931014 [details]
File: tmpIFHSo2

Comment 12 Nate Straz 2014-08-26 18:03:01 UTC
*** Bug 1133943 has been marked as a duplicate of this bug. ***

Comment 13 Nate Straz 2014-08-26 18:07:20 UTC
This looks like a major issue that's causing other problems.  I've had a cluster up for 24 hours and I already have 1000+ cores from stonithd.

[root@host-077 cores]# pwd
/var/lib/pacemaker/cores
[root@host-077 cores]# ls | wc -l ; du -sh .
1298
3.1G    .


The log is filling with messages like these:
Aug 26 13:02:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3078 to record non-fatal assert at logging.c:73 : Source ID 25 was not found when attempting to remove it
Aug 26 13:02:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 25 was not found when attempting to remove it
Aug 26 13:02:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3079 to record non-fatal assert at logging.c:73 : Source ID 26 was not found when attempting to remove it
Aug 26 13:02:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 26 was not found when attempting to remove it
Aug 26 13:03:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3147 to record non-fatal assert at logging.c:73 : Source ID 27 was not found when attempting to remove it
Aug 26 13:03:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 27 was not found when attempting to remove it
Aug 26 13:03:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3148 to record non-fatal assert at logging.c:73 : Source ID 28 was not found when attempting to remove it
Aug 26 13:03:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 28 was not found when attempting to remove it
Aug 26 13:04:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3223 to record non-fatal assert at logging.c:73 : Source ID 29 was not found when attempting to remove it
Aug 26 13:04:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 29 was not found when attempting to remove it
Aug 26 13:04:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3224 to record non-fatal assert at logging.c:73 : Source ID 30 was not found when attempting to remove it
Aug 26 13:04:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 30 was not found when attempting to remove it
Aug 26 13:05:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3293 to record non-fatal assert at logging.c:73 : Source ID 31 was not found when attempting to remove it
Aug 26 13:05:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 31 was not found when attempting to remove it
Aug 26 13:05:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3294 to record non-fatal assert at logging.c:73 : Source ID 32 was not found when attempting to remove it
Aug 26 13:05:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 32 was not found when attempting to remove it
Aug 26 13:06:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3362 to record non-fatal assert at logging.c:73 : Source ID 33 was not found when attempting to remove it
Aug 26 13:06:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 33 was not found when attempting to remove it
Aug 26 13:06:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3363 to record non-fatal assert at logging.c:73 : Source ID 34 was not found when attempting to remove it
Aug 26 13:06:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 34 was not found when attempting to remove it

Comment 14 Andrew Beekhof 2014-08-26 22:01:15 UTC
I would expect this to be a dup of bug #1127289.  Can you confirm if your version of glib has been released to customers?

Comment 15 Nate Straz 2014-08-27 14:08:48 UTC
I'm running glibc-2.17-55.el7.x86_64

Comment 16 Nate Straz 2014-08-27 14:10:01 UTC
Sorry, wrong library.  I'm running glib2-2.40.0-2.el7.x86_64

Comment 17 Nate Straz 2014-08-27 14:50:53 UTC
It looks like the same issue to me.  There are a couple failure scenarios that is causing a lot of trouble so getting the fix in would be much appreciated.

1. when stonithd causes a core dump abrtd runs os-prober.  If there was a mkfs.gfs2 running at the time, this will cause a mount attempt and a segfault in mkfs.  

2. If I disable abrtd, cores from stonithd fills /var/lib/pacemaker/cores and causes a file system full condition.  More chaos after that happens.

Comment 18 Andrew Beekhof 2014-08-27 23:04:17 UTC
(In reply to Nate Straz from comment #16)
> Sorry, wrong library.  I'm running glib2-2.40.0-2.el7.x86_64

Is that what customers are using or something from brew?
In bug #1127289 it turned out not to be live, so we didn't prioritise an update.

Comment 19 Andrew Beekhof 2014-08-27 23:10:07 UTC
(In reply to Nate Straz from comment #17)
> It looks like the same issue to me.  There are a couple failure scenarios
> that is causing a lot of trouble so getting the fix in would be much
> appreciated.

I have to do a rebase before that deadline expires anyway, so it wont be far away.

The good news is that in addition to the fix, crm_glib_handler() has previously been updated to not produce core files by default anymore.  This will also get picked up by the rebase.

> 
> 1. when stonithd causes a core dump abrtd runs os-prober.  If there was a
> mkfs.gfs2 running at the time, this will cause a mount attempt and a
> segfault in mkfs.  

ouch

> 
> 2. If I disable abrtd, cores from stonithd fills /var/lib/pacemaker/cores
> and causes a file system full condition.  More chaos after that happens.

i bet :-(

Comment 24 Nate Straz 2014-11-19 14:35:15 UTC
Recent cluster are showing no core dumps in /var/lib/pacemaker/cores and no errors from stonith-ng.

[root@host-008 ~]# ls /var/lib/pacemaker/cores/
[root@host-008 ~]# rpm -q pacemaker
pacemaker-1.1.12-10.el7.x86_64
[root@host-008 ~]# grep stonith-ng /var/log/messages | grep error
[root@host-008 ~]#

Comment 26 errata-xmlrpc 2015-03-05 10:00:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0440.html