Description of problem: Version-Release number of selected component: pacemaker-1.1.10-32.el7_0 Additional info: reporter: libreport-2.1.11 backtrace_rating: 4 cmdline: /usr/libexec/pacemaker/stonithd crash_function: crm_abort executable: /usr/libexec/pacemaker/stonithd kernel: 3.10.0-131.el7.x86_64 runlevel: N 3 tmpBQyGN7: 2014-08-26 08:48:58,453 INFO: could not run 'tree /var/lib': command not found type: CCpp uid: 0 Truncated backtrace: Thread no. 1 (7 frames) #2 crm_abort at /lib64/libcrmcommon.so.3 #3 crm_glib_handler at /lib64/libcrmcommon.so.3 #6 g_source_remove at gmain.c:2194 #7 stonith_action_clear_tracking_data at /lib64/libstonithd.so.2 #8 stonith_action_destroy at /lib64/libstonithd.so.2 #9 child_death_dispatch at /lib64/libcrmcommon.so.3 #10 crm_signal_dispatch at /lib64/libcrmcommon.so.3
Created attachment 931005 [details] File: backtrace
Created attachment 931006 [details] File: cgroup
Created attachment 931007 [details] File: core_backtrace
Created attachment 931008 [details] File: environ
Created attachment 931009 [details] File: limits
Created attachment 931010 [details] File: maps
Created attachment 931011 [details] File: open_fds
Created attachment 931012 [details] File: proc_pid_status
Created attachment 931013 [details] File: sosreport.log
Created attachment 931014 [details] File: tmpIFHSo2
*** Bug 1133943 has been marked as a duplicate of this bug. ***
This looks like a major issue that's causing other problems. I've had a cluster up for 24 hours and I already have 1000+ cores from stonithd. [root@host-077 cores]# pwd /var/lib/pacemaker/cores [root@host-077 cores]# ls | wc -l ; du -sh . 1298 3.1G . The log is filling with messages like these: Aug 26 13:02:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3078 to record non-fatal assert at logging.c:73 : Source ID 25 was not found when attempting to remove it Aug 26 13:02:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 25 was not found when attempting to remove it Aug 26 13:02:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3079 to record non-fatal assert at logging.c:73 : Source ID 26 was not found when attempting to remove it Aug 26 13:02:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 26 was not found when attempting to remove it Aug 26 13:03:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3147 to record non-fatal assert at logging.c:73 : Source ID 27 was not found when attempting to remove it Aug 26 13:03:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 27 was not found when attempting to remove it Aug 26 13:03:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3148 to record non-fatal assert at logging.c:73 : Source ID 28 was not found when attempting to remove it Aug 26 13:03:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 28 was not found when attempting to remove it Aug 26 13:04:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3223 to record non-fatal assert at logging.c:73 : Source ID 29 was not found when attempting to remove it Aug 26 13:04:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 29 was not found when attempting to remove it Aug 26 13:04:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3224 to record non-fatal assert at logging.c:73 : Source ID 30 was not found when attempting to remove it Aug 26 13:04:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 30 was not found when attempting to remove it Aug 26 13:05:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3293 to record non-fatal assert at logging.c:73 : Source ID 31 was not found when attempting to remove it Aug 26 13:05:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 31 was not found when attempting to remove it Aug 26 13:05:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3294 to record non-fatal assert at logging.c:73 : Source ID 32 was not found when attempting to remove it Aug 26 13:05:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 32 was not found when attempting to remove it Aug 26 13:06:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3362 to record non-fatal assert at logging.c:73 : Source ID 33 was not found when attempting to remove it Aug 26 13:06:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 33 was not found when attempting to remove it Aug 26 13:06:15 host-077 stonith-ng[2409]: error: crm_abort: crm_glib_handler: Forked child 3363 to record non-fatal assert at logging.c:73 : Source ID 34 was not found when attempting to remove it Aug 26 13:06:15 host-077 stonith-ng[2409]: error: crm_glib_handler: GLib: Source ID 34 was not found when attempting to remove it
I would expect this to be a dup of bug #1127289. Can you confirm if your version of glib has been released to customers?
I'm running glibc-2.17-55.el7.x86_64
Sorry, wrong library. I'm running glib2-2.40.0-2.el7.x86_64
It looks like the same issue to me. There are a couple failure scenarios that is causing a lot of trouble so getting the fix in would be much appreciated. 1. when stonithd causes a core dump abrtd runs os-prober. If there was a mkfs.gfs2 running at the time, this will cause a mount attempt and a segfault in mkfs. 2. If I disable abrtd, cores from stonithd fills /var/lib/pacemaker/cores and causes a file system full condition. More chaos after that happens.
(In reply to Nate Straz from comment #16) > Sorry, wrong library. I'm running glib2-2.40.0-2.el7.x86_64 Is that what customers are using or something from brew? In bug #1127289 it turned out not to be live, so we didn't prioritise an update.
(In reply to Nate Straz from comment #17) > It looks like the same issue to me. There are a couple failure scenarios > that is causing a lot of trouble so getting the fix in would be much > appreciated. I have to do a rebase before that deadline expires anyway, so it wont be far away. The good news is that in addition to the fix, crm_glib_handler() has previously been updated to not produce core files by default anymore. This will also get picked up by the rebase. > > 1. when stonithd causes a core dump abrtd runs os-prober. If there was a > mkfs.gfs2 running at the time, this will cause a mount attempt and a > segfault in mkfs. ouch > > 2. If I disable abrtd, cores from stonithd fills /var/lib/pacemaker/cores > and causes a file system full condition. More chaos after that happens. i bet :-(
Recent cluster are showing no core dumps in /var/lib/pacemaker/cores and no errors from stonith-ng. [root@host-008 ~]# ls /var/lib/pacemaker/cores/ [root@host-008 ~]# rpm -q pacemaker pacemaker-1.1.12-10.el7.x86_64 [root@host-008 ~]# grep stonith-ng /var/log/messages | grep error [root@host-008 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0440.html