Bug 2011083

Summary: Backport audit log silence change
Product: OpenShift Container Platform Reporter: Matthew Robson <mrobson>
Component: Machine Config OperatorAssignee: Kirsten Garrison <kgarriso>
Machine Config Operator sub component: Machine Config Operator QA Contact: Manoj Hans <mhans>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: aos-bugs, kgarriso, mhans, mkrejci, rioliu, skrenger, steven.barre, suchaudh, wking
Version: 4.7Keywords: FastFix
Target Milestone: ---   
Target Release: 4.8.z   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2011087 2011375 (view as bug list) Environment:
Last Closed: 2021-10-19 20:35:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2011087    
Bug Blocks: 2011375    

Description Matthew Robson 2021-10-05 22:27:02 UTC
Description of problem:

Backport https://github.com/openshift/machine-config-operator/pull/2633 for 4.8 and 4.7

Version-Release number of MCO (Machine Config Operator) (if applicable):
4.7/4.8

Platform (AWS, VSphere, Metal, etc.):
All

Actual results:

Large OCP 4.7.32 cluster seeing 10 million logs per hour during their upgrade from 4.6.25. Overwhelming cluster logging and local SSDs.

Comment 5 Scott Dodson 2021-10-07 20:37:56 UTC
@Manoj,

That's very likely due to the fact that 4.7 also has the same problem, while we've backported the fix there too, we'll want to make sure that the version used during that upgrade includes the fix which I suspect is not the case as that's a stable-4.7 to 4.8 upgrade and these fixes haven't made it to the stable stream yet.

Comment 6 Manoj Hans 2021-10-08 09:49:53 UTC
Yes @Scott, it was the reason for failure. I have validated manually using the latest build, it is working fine. Audit logs do not contain NETFILTER_CFG msg=audit. Below are the steps:

Upgrade from 4.7.0-0.nightly-2021-10-07-235007 to 4.8.0-0.ci-2021-10-08-041634:
 
oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.8.0-0.ci-2021-10-08-041634 --force --allow-explicit-upgrade

warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
warning: The requested upgrade image is not one of the available updates.  You have used --allow-explicit-upgrade to the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Updating to release image registry.ci.openshift.org/ocp/release:4.8.0-0.ci-2021-10-08-041634

Execute below command to see the progress:
oc get clusterversion -w
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-10-07-235007   True        True          7s      Working towards registry.ci.openshift.org/ocp/release:4.8.0-0.ci-2021-10-08-041634: downloading update
version   4.7.0-0.nightly-2021-10-07-235007   True        True          15s     Working towards 4.8.0-0.ci-2021-10-08-041634: 11 of 699 done (1% complete)

oc get clusterversion
NAME      VERSION                        AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.ci-2021-10-08-041634   True        False         8m31s   Cluster version is 4.8.0-0.ci-2021-10-08-041634

oc get node
NAME                                        STATUS   ROLES    AGE    VERSION
ip-10-0-54-25.us-east-2.compute.internal    Ready    worker   121m   v1.21.1+a620f50
ip-10-0-56-14.us-east-2.compute.internal    Ready    master   127m   v1.21.1+a620f50
ip-10-0-58-77.us-east-2.compute.internal    Ready    worker   119m   v1.21.1+a620f50
ip-10-0-61-13.us-east-2.compute.internal    Ready    master   125m   v1.21.1+a620f50
ip-10-0-72-79.us-east-2.compute.internal    Ready    master   127m   v1.21.1+a620f50
ip-10-0-77-154.us-east-2.compute.internal   Ready    worker   122m   v1.21.1+a620f50

oc debug node/ip-10-0-54-25.us-east-2.compute.internal
Starting pod/ip-10-0-54-25us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.54.25
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# vi var/log/audit/audit.log

type=DAEMON_START msg=audit(1633678490.184:75): op=start ver=3.0 format=enriched kernel=4.18.0-305.19.1.el8_4.x86_64 auid=4294967295 pid=1209 uid=0 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=success^]AUID="unse" UID="root"
type=SERVICE_START msg=audit(1633678490.375:5): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=coreos-update-ca-trust comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? erminal=? res=success'^]UID="root" AUID="unset"
type=CONFIG_CHANGE msg=audit(1633678490.506:6): op=set audit_backlog_limit=8192 old=64 auid=4294967295 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 res=1^]AUID="unset"
type=SYSCALL msg=audit(1633678490.506:6): arch=c000003e syscall=44 success=yes exit=56 a0=3 a1=7ffe3699ed60 a2=38 a3=0 items=0 ppid=1213 pid=1232 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty(none) ses=4294967295 comm="auditctl" exe="/usr/sbin/auditctl" subj=system_u:system_r:unconfined_service_t:s0 key=(null)^]ARCH=x86_64 SYSCALL=sendto AUID="unset" UID="root" GID="root" EUID="root" SUID="root" FSUID="root" GID="root" SGID="root" FSGID="root"
.................
type=PROCTITLE msg=audit(1633684643.804:55): proctitle=2F7573722F6C6962657865632F706C6174666F726D2D707974686F6E002D4573002F7573722F7362696E2F74756E6564002D2D6E6F2D64627573
type=SERVICE_STOP msg=audit(1633684699.155:56): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rpm-ostreed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? es=success'^]UID="root" AUID="unset"
type=SERVICE_START msg=audit(1633685477.966:57): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-clean comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?terminal=? res=success'^]UID="root" AUID="unset"
type=SERVICE_STOP msg=audit(1633685477.966:58): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-clean comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? erminal=? res=success'^]UID="root" AUID="unset"

Comment 13 errata-xmlrpc 2021-10-19 20:35:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.15 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3821

Comment 14 Sudarshan Chaudhari 2021-10-21 21:03:31 UTC
Hello, 

I see the issue is fixed in 4.8.15 but this bugzill is pointing to OCP 4.7. 
I am looking to know if the fix is also available in OCP 4.7 latest errata. If its available, can we have the errata link?

Thanks.

Comment 15 W. Trevor King 2021-10-21 22:01:29 UTC
(In reply to Sudarshan Chaudhari from comment #14)
> I am looking to know if the fix is also available in OCP 4.7 latest errata.

Up in this bug's metadata^^, you can see:

  Blocks: 2011375

Clicking through to the 4.7.z bug 2011375, you can see that the fix shipped in 4.7.34 [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2011375#c10