openshift-apiserver not correctly flushing its audit log to disk or must-gather not correctly retrieving content
See this PR for description of malformed logs found during failed CI runs: https://github.com/openshift/origin/pull/24641 Failed runs: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.4/1953 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.4/1948 See here for more: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-blocking#release-openshift-origin-installer-e2e-gcp-4.4&sort-by-flakiness= And discussion in slack: https://coreos.slack.com/archives/CB48XQ4KZ/p1583421873221000
Is this separate from [1]? The outcome of the slack discussion was [2] and [3] which have already merged to master and release-4.4 respectively. 1: https://bugzilla.redhat.com/show_bug.cgi?id=1808568 2: https://bugzilla.redhat.com/show_bug.cgi?id=1808568 2: https://bugzilla.redhat.com/show_bug.cgi?id=1811636
Sorry, links should have been as follows: 1: https://bugzilla.redhat.com/show_bug.cgi?id=1808568 2: https://github.com/openshift/cluster-openshift-apiserver-operator/pull/331 3: https://github.com/openshift/cluster-openshift-apiserver-operator/pull/332
This should be handled already in the PRs Maru mentioned in the previous comment.
Checked above links, the root cause was we didn't ensure single apiserver pod running on a master. The fix updates deployment YAML with rolling deployment and podAntiAffinity to ensure that. Verified in 4.5.0-0.nightly-2020-04-29-040854 env by repeatedly delete the openshift-apiserver pods to make them restart, meantime check: [root@ip-10-0-135-23 /]# cd /var/log/ [root@ip-10-0-135-23 log]# grep -nrv '^{"kind":"Event"' openshift-apiserver [root@ip-10-0-135-23 log]# grep -nr ' ' openshift-apiserver The output is empty. No malformed log.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days