Description of problem: After I created 300 resources in a cluster and then deleted them via cibadmin --replace quote from manpage for <resources/> the cluster started to be really busy. As it's on relatively weak virts, it was hogging up CPU quite heavily and that lead to some timeouts. In the end the cluster recovered well, but the interim messages filled up the syslog (and caused journald to sweat heavily too). While systemd-journald filtered these messages out (like "suppressed 50000 messages"), the file /var/log/pacemaker.log did not have such luck and filled up the drive completely with messages like: Jan 14 20:27:27 [21120] virt-084.cluster-qe.lab.eng.brq.redhat.com lrmd: error: gio_poll_dispatch_update: Adaptor for descriptor 8 is not in-use [root@virt-084 log]# grep "Adaptor for descriptor 8 is not in-use" pacemaker.log | wc -l 33327376 These log messages should be limited (or log size in general, or something that will prevent the disk filling up). Version-Release number of selected component (if applicable): pacemaker-1.1.12-15.el7.x86_64 How reproducible: did it once Steps to Reproduce: 1. for i in `seq -w 1 300`; do echo "Creating $i"; pcs resource create dummy$i ocf:heartbeat:Dummy; done 2. cibadmin --replace --scope resources --xml-text '<resources/>' 3. Actual results: logs get spammed as described above. The weaker the HW running it on, the more it will probably be Expected results: Log messages limited if they are directed to other targets than syslog (where it does the suppression automatically) Additional info:
David: I've never seen this in any of my CTS runs... has something changed in libqb recently? Sounds like you're asking for log compression too, which would be best implemented as a feature for libqb.
(In reply to Andrew Beekhof from comment #2) > David: I've never seen this in any of my CTS runs... has something changed > in libqb recently? I have not seen this. It must correlate to load. > Sounds like you're asking for log compression too, which would be best > implemented as a feature for libqb.
(In reply to David Vossel from comment #3) > (In reply to Andrew Beekhof from comment #2) > > David: I've never seen this in any of my CTS runs... has something changed > > in libqb recently? > > I have not seen this. It must correlate to load. > > > Sounds like you're asking for log compression too, which would be best > > implemented as a feature for libqb. I actually know what causes this now. Under load, it's possible that libqb will have to start queuing events to be sent out later because internal send buffers are full. When this occurs, libqb uses the poll modify callback to start watching for the POLLOUT events in order to know when the fd is available for writes again. Now, here's the problem. Pacemaker's usage of glib's mainloop has a race condition in the poll's modify function that causes the fd to be marked as "unused" even though it is actually still in use. This patch to pacemaker should resolve this issue. https://github.com/davidvossel/pacemaker/commit/0178380549fbff35b3de919db898d4b2d27e23d2 -- David
Retested with pacemaker-1.1.13-6.el7.x86_64, the messages are gone.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-2383.html