+++ This bug was initially created as a clone of Bug #1785219 +++ Description of problem: MCO reports worker pools as Degraded if the cluster contains machines with: - audit enabled - logger binary doesn't support --journald flag This is because if the journald flag is not available in logger, we default to a simple grep to get the pending state: ~~~ cmdLiteral := "journalctl -o cat _UID=0 | grep OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK" ~~~ When enabled, audit logs the following in journald: ~~~ node=XXX type=EXECVE msg=audit(1576504519.252:5356): argc=2 a0="grep" a1="OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK" ~~~ This is processed by dn.processJournalOutput(journalOutput) and an error is returned: ~~~ worker: 'pool is degraded because nodes fail with "X nodes are reporting degraded status on sync": "Node XXX is reporting: \"getting pending state from journal: invalid character ''o'' in literal null ~~~ Workaround: Disable audit. Version-Release number of selected component (if applicable): All 4.X versions are affected How reproducible: Always Steps to Reproduce: 1. Install 4.X with RHEL nodes with audit enabled and a logger binary which doesn't support --journald flag --- Additional comment from Borja on 2019-12-19 12:31:01 UTC --- Proposed patch: https://github.com/openshift/machine-config-operator/pull/1350 This is a fast hack, but considering this is a corner case and logger in RHEL 7.7 supports --journald it will be short-lived.
This bug should stay in NEW until the master BZ is fully verified. It's the bot duty to change things on BZs and that's done automatically - no need to set this to NEW and link github.
Verified on 4.3.0-0.nightly-2020-03-09-150655 [root@helper openshift]# cat <<EOF>file.yaml > apiVersion: machineconfiguration.openshift.io/v1 > kind: MachineConfig > metadata: > labels: > machineconfiguration.openshift.io/role: worker > name: test-file > spec: > config: > ignition: > version: 2.2.0 > storage: > files: > - contents: > source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK > filesystem: root > mode: 0644 > path: /etc/test > EOF [root@helper openshift]# oc create -f file.yaml machineconfig.machineconfiguration.openshift.io/test-file created [root@helper openshift]# watch oc get node [root@helper openshift]# oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT worker rendered-worker-18ba77673b208e078379b034184866c5 True False False 3 3 3 0 [root@helper openshift]# oc get node NAME STATUS ROLES AGE VERSION master0.ocp4.example.com Ready master 112m v1.16.2 master1.ocp4.example.com Ready master 112m v1.16.2 master2.ocp4.example.com Ready master 112m v1.16.2 worker0.ocp4.example.com Ready worker 97m v1.16.2 worker1.ocp4.example.com Ready worker 97m v1.16.2 worker2.ocp4.example.com Ready worker 34m v1.16.2 [root@helper openshift]# oc debug node/worker2.ocp4.example.com Starting pod/worker2ocp4examplecom-debug ... To use host binaries, run `chroot /host` chroot /host Pod IP: 192.168.7.13 If you don't see a command prompt, try pressing enter. chroot /host sh-4.2# journalctl -o cat _UID=0 | grep OPENSHIFT_MACHINE_CONFIG {"MESSAGE": "rendered-worker-18ba77673b208e078379b034184866c5", "BOOT_ID": "c16aaae2-ed7a-49f0-b4ba-c0630d51b5db", "PENDING": "0", "OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK": "1"} sh-4.2# journalctl -o cat _UID=0 | grep OPENSHIFT_MACHINE_CONFIG {"MESSAGE": "rendered-worker-18ba77673b208e078379b034184866c5", "BOOT_ID": "c16aaae2-ed7a-49f0-b4ba-c0630d51b5db", "PENDING": "0", "OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK": "1"} node=localhost.localdomain type=EXECVE msg=audit(1583793259.482:7349): argc=2 a0="grep" a1="OPENSHIFT_MACHINE_CONFIG" node=localhost.localdomain type=EXECVE msg=audit(1583793300.243:8024): argc=3 a0="grep" a1="--color=auto" a2="OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK" node=localhost.localdomain type=EXECVE msg=audit(1583793303.656:8054): argc=5 a0="grep" a1="--color=auto" a2="OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK" a3="anaconda-ks.cfg" a4="original-ks.cfg" node=localhost.localdomain type=EXECVE msg=audit(1583793305.909:8063): argc=2 a0="grep" a1="OPENSHIFT_MACHINE_CONFIG" sh-4.2# Removing debug pod ... [root@helper openshift]# oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT worker rendered-worker-18ba77673b208e078379b034184866c5 True False False 3 3 3 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0858