Bug 1785281 - Machine config pool reports Degraded when audit is enabled
Summary: Machine config pool reports Degraded when audit is enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.4.0
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1785279
TreeView+ depends on / blocked
 
Reported: 2019-12-19 15:16 UTC by Borja Aranda
Modified: 2023-09-07 21:18 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1785219
Environment:
Last Closed: 2020-05-04 11:20:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1351 0 None closed Bug 1785281: Discard audit messages from journald 2020-05-09 01:33:06 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:21:18 UTC

Description Borja Aranda 2019-12-19 15:16:20 UTC
+++ This bug was initially created as a clone of Bug #1785219 +++

Description of problem:
MCO reports worker pools as Degraded if the cluster contains machines with:
- audit enabled
- logger binary doesn't support --journald flag

This is because if the journald flag is not available in logger, we default to a simple grep to get the pending state:
~~~
cmdLiteral := "journalctl -o cat _UID=0 | grep OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK"
~~~

When enabled, audit logs the following in journald:
~~~
node=XXX type=EXECVE msg=audit(1576504519.252:5356): argc=2 a0="grep" a1="OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK"
~~~

This is processed by dn.processJournalOutput(journalOutput) and an error is returned:
~~~
worker: 'pool is degraded because nodes fail with "X nodes are reporting degraded
        status on sync": "Node XXX is reporting:
        \"getting pending state from journal: invalid character ''o'' in literal null
~~~

Workaround: Disable audit.

Version-Release number of selected component (if applicable):
All 4.X versions are affected

How reproducible:
Always

Steps to Reproduce:
1. Install 4.X with RHEL nodes with audit enabled and a logger binary which doesn't support --journald flag

--- Additional comment from Borja on 2019-12-19 12:31:01 UTC ---

Proposed patch: https://github.com/openshift/machine-config-operator/pull/1350

This is a fast hack, but considering this is a corner case and logger in RHEL 7.7 supports --journald it will be short-lived.

Comment 1 Kirsten Garrison 2019-12-19 17:56:00 UTC
@Borja

Please supply a must gather from this cluster. Thanks.

Comment 11 errata-xmlrpc 2020-05-04 11:20:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.