Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1785219

Summary: Machine config pool reports Degraded when audit is enabled
Product: OpenShift Container Platform Reporter: Borja Aranda <farandac>
Component: Machine Config OperatorAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.zCC: eparis, kgarriso, ltitov
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1785279 1785281 (view as bug list) Environment:
Last Closed: 2020-05-13 11:07:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1785279    
Bug Blocks:    

Description Borja Aranda 2019-12-19 12:19:06 UTC
Description of problem:
MCO reports worker pools as Degraded if the cluster contains machines with:
- audit enabled
- logger binary doesn't support --journald flag

This is because if the journald flag is not available in logger, we default to a simple grep to get the pending state:
~~~
cmdLiteral := "journalctl -o cat _UID=0 | grep OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK"
~~~

When enabled, audit logs the following in journald:
~~~
node=XXX type=EXECVE msg=audit(1576504519.252:5356): argc=2 a0="grep" a1="OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK"
~~~

This is processed by dn.processJournalOutput(journalOutput) and an error is returned:
~~~
worker: 'pool is degraded because nodes fail with "X nodes are reporting degraded
        status on sync": "Node XXX is reporting:
        \"getting pending state from journal: invalid character ''o'' in literal null
~~~

Workaround: Disable audit.

Version-Release number of selected component (if applicable):
All 4.X versions are affected

How reproducible:
Always

Steps to Reproduce:
1. Install 4.X with RHEL nodes with audit enabled and a logger binary which doesn't support --journald flag

Comment 1 Borja Aranda 2019-12-19 12:31:01 UTC
Proposed patch: https://github.com/openshift/machine-config-operator/pull/1350

This is a fast hack, but considering this is a corner case and logger in RHEL 7.7 supports --journald it will be short-lived.

Comment 6 Antonio Murdaca 2020-01-30 11:32:35 UTC
This bug should stay in NEW until the 4.3 BZ is fully verified (and for the 4.3 one, for the master bug to be verified). It's the bot duty to change things on BZs and that's done automatically - no need to set this to NEW and link github.

Comment 7 Antonio Murdaca 2020-01-30 11:34:02 UTC
It will all start cascading when the master BZ is verified, but it's still on QA: https://bugzilla.redhat.com/show_bug.cgi?id=1785281

Comment 11 Michael Nguyen 2020-05-09 03:09:26 UTC
Verified on 4.2.0-0.nightly-2020-05-07-194422

Comment 13 errata-xmlrpc 2020-05-13 11:07:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2023