Description of problem: Backport https://github.com/openshift/machine-config-operator/pull/2633 for 4.8 and 4.7 Version-Release number of MCO (Machine Config Operator) (if applicable): 4.7/4.8 Platform (AWS, VSphere, Metal, etc.): All Actual results: Large OCP 4.7.32 cluster seeing 10 million logs per hour during their upgrade from 4.6.25. Overwhelming cluster logging and local SSDs.
Case 1: 4.8-upgrade-from-stable-4.7 (Fail) Results: Audit logs is containing NETFILTER_CFG msg=audit Audit file: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1445884984387702784/artifacts/ Case 2: 4.8 (Pass) Results: Audit logs do not contain NETFILTER_CFG msg=audit Audit file: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws/1445884975978123264/artifacts/e2e-aws/gather-extra/artifacts/nodes/ip-10-0-150-158.ec2.internal/
@Manoj, That's very likely due to the fact that 4.7 also has the same problem, while we've backported the fix there too, we'll want to make sure that the version used during that upgrade includes the fix which I suspect is not the case as that's a stable-4.7 to 4.8 upgrade and these fixes haven't made it to the stable stream yet.
Yes @Scott, it was the reason for failure. I have validated manually using the latest build, it is working fine. Audit logs do not contain NETFILTER_CFG msg=audit. Below are the steps: Upgrade from 4.7.0-0.nightly-2021-10-07-235007 to 4.8.0-0.ci-2021-10-08-041634: oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.8.0-0.ci-2021-10-08-041634 --force --allow-explicit-upgrade warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to proceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image registry.ci.openshift.org/ocp/release:4.8.0-0.ci-2021-10-08-041634 Execute below command to see the progress: oc get clusterversion -w NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-10-07-235007 True True 7s Working towards registry.ci.openshift.org/ocp/release:4.8.0-0.ci-2021-10-08-041634: downloading update version 4.7.0-0.nightly-2021-10-07-235007 True True 15s Working towards 4.8.0-0.ci-2021-10-08-041634: 11 of 699 done (1% complete) oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.ci-2021-10-08-041634 True False 8m31s Cluster version is 4.8.0-0.ci-2021-10-08-041634 oc get node NAME STATUS ROLES AGE VERSION ip-10-0-54-25.us-east-2.compute.internal Ready worker 121m v1.21.1+a620f50 ip-10-0-56-14.us-east-2.compute.internal Ready master 127m v1.21.1+a620f50 ip-10-0-58-77.us-east-2.compute.internal Ready worker 119m v1.21.1+a620f50 ip-10-0-61-13.us-east-2.compute.internal Ready master 125m v1.21.1+a620f50 ip-10-0-72-79.us-east-2.compute.internal Ready master 127m v1.21.1+a620f50 ip-10-0-77-154.us-east-2.compute.internal Ready worker 122m v1.21.1+a620f50 oc debug node/ip-10-0-54-25.us-east-2.compute.internal Starting pod/ip-10-0-54-25us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.54.25 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# vi var/log/audit/audit.log type=DAEMON_START msg=audit(1633678490.184:75): op=start ver=3.0 format=enriched kernel=4.18.0-305.19.1.el8_4.x86_64 auid=4294967295 pid=1209 uid=0 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=success^]AUID="unse" UID="root" type=SERVICE_START msg=audit(1633678490.375:5): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=coreos-update-ca-trust comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? erminal=? res=success'^]UID="root" AUID="unset" type=CONFIG_CHANGE msg=audit(1633678490.506:6): op=set audit_backlog_limit=8192 old=64 auid=4294967295 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 res=1^]AUID="unset" type=SYSCALL msg=audit(1633678490.506:6): arch=c000003e syscall=44 success=yes exit=56 a0=3 a1=7ffe3699ed60 a2=38 a3=0 items=0 ppid=1213 pid=1232 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty(none) ses=4294967295 comm="auditctl" exe="/usr/sbin/auditctl" subj=system_u:system_r:unconfined_service_t:s0 key=(null)^]ARCH=x86_64 SYSCALL=sendto AUID="unset" UID="root" GID="root" EUID="root" SUID="root" FSUID="root" GID="root" SGID="root" FSGID="root" ................. type=PROCTITLE msg=audit(1633684643.804:55): proctitle=2F7573722F6C6962657865632F706C6174666F726D2D707974686F6E002D4573002F7573722F7362696E2F74756E6564002D2D6E6F2D64627573 type=SERVICE_STOP msg=audit(1633684699.155:56): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rpm-ostreed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? es=success'^]UID="root" AUID="unset" type=SERVICE_START msg=audit(1633685477.966:57): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-clean comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?terminal=? res=success'^]UID="root" AUID="unset" type=SERVICE_STOP msg=audit(1633685477.966:58): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-clean comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? erminal=? res=success'^]UID="root" AUID="unset"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.15 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3821
Hello, I see the issue is fixed in 4.8.15 but this bugzill is pointing to OCP 4.7. I am looking to know if the fix is also available in OCP 4.7 latest errata. If its available, can we have the errata link? Thanks.
(In reply to Sudarshan Chaudhari from comment #14) > I am looking to know if the fix is also available in OCP 4.7 latest errata. Up in this bug's metadata^^, you can see: Blocks: 2011375 Clicking through to the 4.7.z bug 2011375, you can see that the fix shipped in 4.7.34 [1]. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2011375#c10