Bug 1133952
| Summary: | [abrt] pacemaker: crm_abort(): stonithd killed by SIGABRT | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Nate Straz <nstraz> | ||||||||||||||||||||||
| Component: | pacemaker | Assignee: | Andrew Beekhof <abeekhof> | ||||||||||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||||||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||||||||
| Priority: | urgent | ||||||||||||||||||||||||
| Version: | 7.0 | CC: | abeekhof, cluster-maint, cluster-qe, dvossel, jkortus, mnovacek | ||||||||||||||||||||||
| Target Milestone: | rc | ||||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||||||
| Whiteboard: | abrt_hash:81e7c231edd24298e0f49715a05cee428242e23c | ||||||||||||||||||||||||
| Fixed In Version: | pacemaker-1.1.12-4 | Doc Type: | Bug Fix | ||||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||||
| Last Closed: | 2015-03-05 10:00:19 UTC | Type: | --- | ||||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||
| Embargoed: | |||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||
|
Description
Nate Straz
2014-08-26 14:06:43 UTC
Created attachment 931005 [details]
File: backtrace
Created attachment 931006 [details]
File: cgroup
Created attachment 931007 [details]
File: core_backtrace
Created attachment 931008 [details]
File: environ
Created attachment 931009 [details]
File: limits
Created attachment 931010 [details]
File: maps
Created attachment 931011 [details]
File: open_fds
Created attachment 931012 [details]
File: proc_pid_status
Created attachment 931013 [details]
File: sosreport.log
Created attachment 931014 [details]
File: tmpIFHSo2
*** Bug 1133943 has been marked as a duplicate of this bug. *** This looks like a major issue that's causing other problems. I've had a cluster up for 24 hours and I already have 1000+ cores from stonithd. [root@host-077 cores]# pwd /var/lib/pacemaker/cores [root@host-077 cores]# ls | wc -l ; du -sh . 1298 3.1G . The log is filling with messages like these: Aug 26 13:02:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3078 to record non-fatal assert at logging.c:73 : Source ID 25 was not found when attempting to remove it Aug 26 13:02:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 25 was not found when attempting to remove it Aug 26 13:02:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3079 to record non-fatal assert at logging.c:73 : Source ID 26 was not found when attempting to remove it Aug 26 13:02:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 26 was not found when attempting to remove it Aug 26 13:03:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3147 to record non-fatal assert at logging.c:73 : Source ID 27 was not found when attempting to remove it Aug 26 13:03:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 27 was not found when attempting to remove it Aug 26 13:03:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3148 to record non-fatal assert at logging.c:73 : Source ID 28 was not found when attempting to remove it Aug 26 13:03:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 28 was not found when attempting to remove it Aug 26 13:04:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3223 to record non-fatal assert at logging.c:73 : Source ID 29 was not found when attempting to remove it Aug 26 13:04:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 29 was not found when attempting to remove it Aug 26 13:04:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3224 to record non-fatal assert at logging.c:73 : Source ID 30 was not found when attempting to remove it Aug 26 13:04:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 30 was not found when attempting to remove it Aug 26 13:05:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3293 to record non-fatal assert at logging.c:73 : Source ID 31 was not found when attempting to remove it Aug 26 13:05:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 31 was not found when attempting to remove it Aug 26 13:05:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3294 to record non-fatal assert at logging.c:73 : Source ID 32 was not found when attempting to remove it Aug 26 13:05:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 32 was not found when attempting to remove it Aug 26 13:06:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3362 to record non-fatal assert at logging.c:73 : Source ID 33 was not found when attempting to remove it Aug 26 13:06:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 33 was not found when attempting to remove it Aug 26 13:06:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3363 to record non-fatal assert at logging.c:73 : Source ID 34 was not found when attempting to remove it Aug 26 13:06:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 34 was not found when attempting to remove it I would expect this to be a dup of bug #1127289. Can you confirm if your version of glib has been released to customers? I'm running glibc-2.17-55.el7.x86_64 Sorry, wrong library. I'm running glib2-2.40.0-2.el7.x86_64 It looks like the same issue to me. There are a couple failure scenarios that is causing a lot of trouble so getting the fix in would be much appreciated. 1. when stonithd causes a core dump abrtd runs os-prober. If there was a mkfs.gfs2 running at the time, this will cause a mount attempt and a segfault in mkfs. 2. If I disable abrtd, cores from stonithd fills /var/lib/pacemaker/cores and causes a file system full condition. More chaos after that happens. (In reply to Nate Straz from comment #16) > Sorry, wrong library. I'm running glib2-2.40.0-2.el7.x86_64 Is that what customers are using or something from brew? In bug #1127289 it turned out not to be live, so we didn't prioritise an update. (In reply to Nate Straz from comment #17) > It looks like the same issue to me. There are a couple failure scenarios > that is causing a lot of trouble so getting the fix in would be much > appreciated. I have to do a rebase before that deadline expires anyway, so it wont be far away. The good news is that in addition to the fix, crm_glib_handler() has previously been updated to not produce core files by default anymore. This will also get picked up by the rebase. > > 1. when stonithd causes a core dump abrtd runs os-prober. If there was a > mkfs.gfs2 running at the time, this will cause a mount attempt and a > segfault in mkfs. ouch > > 2. If I disable abrtd, cores from stonithd fills /var/lib/pacemaker/cores > and causes a file system full condition. More chaos after that happens. i bet :-( Recent cluster are showing no core dumps in /var/lib/pacemaker/cores and no errors from stonith-ng. [root@host-008 ~]# ls /var/lib/pacemaker/cores/ [root@host-008 ~]# rpm -q pacemaker pacemaker-1.1.12-10.el7.x86_64 [root@host-008 ~]# grep stonith-ng /var/log/messages | grep error [root@host-008 ~]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0440.html |