1785279 – Machine config pool reports Degraded when audit is enabled

Bug 1785279 - Machine config pool reports Degraded when audit is enabled

Summary: Machine config pool reports Degraded when audit is enabled

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.3.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.3.z
Assignee:	Antonio Murdaca
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:	1785281
Blocks:	1785219
TreeView+	depends on / blocked

Reported:	2019-12-19 15:14 UTC by Borja Aranda
Modified:	2020-03-24 14:32 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1785219
Environment:
Last Closed:	2020-03-24 14:32:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 1352	0	None	closed	Bug 1785279: [release-4.3] Discard audit messages from journald	2020-04-30 17:38:50 UTC
Red Hat Product Errata	RHBA-2020:0858	0	None	None	None	2020-03-24 14:32:48 UTC

Description Borja Aranda 2019-12-19 15:14:35 UTC

+++ This bug was initially created as a clone of Bug #1785219 +++

Description of problem:
MCO reports worker pools as Degraded if the cluster contains machines with:
- audit enabled
- logger binary doesn't support --journald flag

This is because if the journald flag is not available in logger, we default to a simple grep to get the pending state:
~~~
cmdLiteral := "journalctl -o cat _UID=0 | grep OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK"
~~~

When enabled, audit logs the following in journald:
~~~
node=XXX type=EXECVE msg=audit(1576504519.252:5356): argc=2 a0="grep" a1="OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK"
~~~

This is processed by dn.processJournalOutput(journalOutput) and an error is returned:
~~~
worker: 'pool is degraded because nodes fail with "X nodes are reporting degraded
        status on sync": "Node XXX is reporting:
        \"getting pending state from journal: invalid character ''o'' in literal null
~~~

Workaround: Disable audit.

Version-Release number of selected component (if applicable):
All 4.X versions are affected

How reproducible:
Always

Steps to Reproduce:
1. Install 4.X with RHEL nodes with audit enabled and a logger binary which doesn't support --journald flag

--- Additional comment from Borja on 2019-12-19 12:31:01 UTC ---

Proposed patch: https://github.com/openshift/machine-config-operator/pull/1350

This is a fast hack, but considering this is a corner case and logger in RHEL 7.7 supports --journald it will be short-lived.

Comment 1 Antonio Murdaca 2020-01-30 11:33:09 UTC

This bug should stay in NEW until the master BZ is fully verified. It's the bot duty to change things on BZs and that's done automatically - no need to set this to NEW and link github.

Comment 4 Michael Nguyen 2020-03-10 01:34:15 UTC

Verified on 4.3.0-0.nightly-2020-03-09-150655

[root@helper openshift]# cat <<EOF>file.yaml
> apiVersion: machineconfiguration.openshift.io/v1
> kind: MachineConfig
> metadata:
>   labels:
>     machineconfiguration.openshift.io/role: worker
>   name: test-file
> spec:
>   config:
>     ignition:
>       version: 2.2.0
>     storage:
>       files:
>       - contents:
>           source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK
>         filesystem: root
>         mode: 0644
>         path: /etc/test
> EOF
[root@helper openshift]# oc create -f file.yaml 
machineconfig.machineconfiguration.openshift.io/test-file created
[root@helper openshift]# watch oc get node
[root@helper openshift]# oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT
worker   rendered-worker-18ba77673b208e078379b034184866c5   True      False      False      3              3                   3                     0
[root@helper openshift]# oc get node
NAME                       STATUS   ROLES    AGE    VERSION
master0.ocp4.example.com   Ready    master   112m   v1.16.2
master1.ocp4.example.com   Ready    master   112m   v1.16.2
master2.ocp4.example.com   Ready    master   112m   v1.16.2
worker0.ocp4.example.com   Ready    worker   97m    v1.16.2
worker1.ocp4.example.com   Ready    worker   97m    v1.16.2
worker2.ocp4.example.com   Ready    worker   34m    v1.16.2
[root@helper openshift]# oc debug node/worker2.ocp4.example.com
Starting pod/worker2ocp4examplecom-debug ...
To use host binaries, run `chroot /host`
chroot /host
Pod IP: 192.168.7.13
If you don't see a command prompt, try pressing enter.
chroot /host
sh-4.2# journalctl -o cat _UID=0 | grep OPENSHIFT_MACHINE_CONFIG       
{"MESSAGE": "rendered-worker-18ba77673b208e078379b034184866c5", "BOOT_ID": "c16aaae2-ed7a-49f0-b4ba-c0630d51b5db", "PENDING": "0", "OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK": "1"}
sh-4.2# journalctl -o cat _UID=0 | grep OPENSHIFT_MACHINE_CONFIG
{"MESSAGE": "rendered-worker-18ba77673b208e078379b034184866c5", "BOOT_ID": "c16aaae2-ed7a-49f0-b4ba-c0630d51b5db", "PENDING": "0", "OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK": "1"}
node=localhost.localdomain type=EXECVE msg=audit(1583793259.482:7349): argc=2 a0="grep" a1="OPENSHIFT_MACHINE_CONFIG"
node=localhost.localdomain type=EXECVE msg=audit(1583793300.243:8024): argc=3 a0="grep" a1="--color=auto" a2="OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK"
node=localhost.localdomain type=EXECVE msg=audit(1583793303.656:8054): argc=5 a0="grep" a1="--color=auto" a2="OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK" a3="anaconda-ks.cfg" a4="original-ks.cfg"
node=localhost.localdomain type=EXECVE msg=audit(1583793305.909:8063): argc=2 a0="grep" a1="OPENSHIFT_MACHINE_CONFIG"
sh-4.2# 
Removing debug pod ...

[root@helper openshift]# oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT
worker   rendered-worker-18ba77673b208e078379b034184866c5   True      False      False      3              3                   3                     0

Comment 6 errata-xmlrpc 2020-03-24 14:32:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0858

Note You need to log in before you can comment on or make changes to this bug.