Bug 1182614

Summary: Logs full of: error: gio_poll_dispatch_update: Adaptor for descriptor 8 is not in-use
Product: Red Hat Enterprise Linux 7 Reporter: Jaroslav Kortus <jkortus>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: cluster-maint, fdinitto, mnovacek
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.13-3.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 12:12:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaroslav Kortus 2015-01-15 14:54:57 UTC
Description of problem:
After I created 300 resources in a cluster and then deleted them via cibadmin --replace quote from manpage for <resources/> the cluster started to be really busy. As it's on relatively weak virts, it was hogging up CPU quite heavily and that lead to some timeouts. In the end the cluster recovered well, but the interim messages filled up the syslog (and caused journald to sweat heavily too).

While systemd-journald filtered these messages out (like "suppressed 50000 messages"), the file /var/log/pacemaker.log did not have such luck and filled up the drive completely with messages like:

Jan 14 20:27:27 [21120] virt-084.cluster-qe.lab.eng.brq.redhat.com       lrmd:    error: gio_poll_dispatch_update:      Adaptor for descriptor 8 is not in-use

[root@virt-084 log]# grep "Adaptor for descriptor 8 is not in-use" pacemaker.log | wc -l
33327376

These log messages should be limited (or log size in general, or something that will prevent the disk filling up).

Version-Release number of selected component (if applicable):
pacemaker-1.1.12-15.el7.x86_64

How reproducible:
did it once

Steps to Reproduce:
1. for i in `seq -w 1 300`; do echo "Creating $i"; pcs resource create dummy$i ocf:heartbeat:Dummy; done
2. cibadmin --replace --scope resources --xml-text '<resources/>'
3.

Actual results:
logs get spammed as described above. The weaker the HW running it on, the more it will probably be

Expected results:
Log messages limited if they are directed to other targets than syslog (where it does the suppression automatically)

Additional info:

Comment 2 Andrew Beekhof 2015-01-16 03:09:55 UTC
David: I've never seen this in any of my CTS runs... has something changed in libqb recently?

Sounds like you're asking for log compression too, which would be best implemented as a  feature for libqb.

Comment 3 David Vossel 2015-01-16 17:13:33 UTC
(In reply to Andrew Beekhof from comment #2)
> David: I've never seen this in any of my CTS runs... has something changed
> in libqb recently?

I have not seen this. It must correlate to load.

> Sounds like you're asking for log compression too, which would be best
> implemented as a  feature for libqb.

Comment 4 David Vossel 2015-04-30 17:15:49 UTC
(In reply to David Vossel from comment #3)
> (In reply to Andrew Beekhof from comment #2)
> > David: I've never seen this in any of my CTS runs... has something changed
> > in libqb recently?
> 
> I have not seen this. It must correlate to load.
> 
> > Sounds like you're asking for log compression too, which would be best
> > implemented as a  feature for libqb.

I actually know what causes this now. Under load, it's possible that libqb will have to start queuing events to be sent out later because internal send buffers are full. When this occurs, libqb uses the poll modify callback to start watching for the POLLOUT events in order to know when the fd is available for writes again.

Now, here's the problem. Pacemaker's usage of glib's mainloop has a race condition in the poll's modify function that causes the fd to be marked as "unused" even though it is actually still in use.

This patch to pacemaker should resolve this issue.
https://github.com/davidvossel/pacemaker/commit/0178380549fbff35b3de919db898d4b2d27e23d2

-- David

Comment 6 Jaroslav Kortus 2015-09-14 19:32:35 UTC
Retested with pacemaker-1.1.13-6.el7.x86_64, the messages are gone.

Comment 7 errata-xmlrpc 2015-11-19 12:12:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2383.html