Description of problem:
MCO reports worker pools as Degraded if the cluster contains machines with:
- audit enabled
- logger binary doesn't support --journald flag
This is because if the journald flag is not available in logger, we default to a simple grep to get the pending state:
cmdLiteral := "journalctl -o cat _UID=0 | grep OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK"
When enabled, audit logs the following in journald:
node=XXX type=EXECVE msg=audit(1576504519.252:5356): argc=2 a0="grep" a1="OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK"
This is processed by dn.processJournalOutput(journalOutput) and an error is returned:
worker: 'pool is degraded because nodes fail with "X nodes are reporting degraded
status on sync": "Node XXX is reporting:
\"getting pending state from journal: invalid character ''o'' in literal null
Workaround: Disable audit.
Version-Release number of selected component (if applicable):
All 4.X versions are affected
Steps to Reproduce:
1. Install 4.X with RHEL nodes with audit enabled and a logger binary which doesn't support --journald flag
Proposed patch: https://github.com/openshift/machine-config-operator/pull/1350
This is a fast hack, but considering this is a corner case and logger in RHEL 7.7 supports --journald it will be short-lived.
This bug should stay in NEW until the 4.3 BZ is fully verified (and for the 4.3 one, for the master bug to be verified). It's the bot duty to change things on BZs and that's done automatically - no need to set this to NEW and link github.
It will all start cascading when the master BZ is verified, but it's still on QA: https://bugzilla.redhat.com/show_bug.cgi?id=1785281
Verified on 4.2.0-0.nightly-2020-05-07-194422
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.