Seeing this in elease-openshift-origin-installer-e2e-gcp-4.5 CI tests reported by https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-origin-installer-e2e-gcp-4.5&sort-by-flakiness= Example of failing jobs: * Alert TargetDown - https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.5/1260 * Alert KubeletPlegDurationHigh - https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.5/1261 * Alert KubeAPIErrorBudgetBurn - https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.5/1256 Error message from one of the failing jobs: fail [github.com/openshift/origin/test/extended/prometheus/prometheus_builds.go:167]: Expected <map[string]error | len:1>: { "count_over_time(ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|KubeAPILatencyHigh\",alertstate=\"firing\",severity!=\"info\"}[2h]) >= 1": { s: "promQL query: count_over_time(ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|KubeAPILatencyHigh\",alertstate=\"firing\",severity!=\"info\"}[2h]) >= 1 had reported incorrect results:\n[{\"metric\":{\"alertname\":\"KubeletPlegDurationHigh\",\"alertstate\":\"firing\",\"instance\":\"10.0.0.5:10250\",\"node\":\"ci-op-snzkl-m-0.c.openshift-gce-devel-ci.internal\",\"quantile\":\"0.99\",\"severity\":\"warning\"},\"value\":[1586250781.73,\"1\"]}]", }, } to be empty Additional information: This maybe related to https://bugzilla.redhat.com/show_bug.cgi?id=1812999
With the openshift-console-operator issue spun off into bug 1821708 (since closed as a dup of bug 1783881) and the KubeletPlegDurationHigh issue spun off into bug 1821697, I guess this bug is now just about the KubeAPIErrorBudgetBurn issue? I'll update the title to reflect that, and hope I'm correct.
I could see different errors in kube-apiserver logs. > E0407 06:33:48.878868 1 structuredmerge.go:103] [SHOULD NOT HAPPEN] failed to create typed new object of type apps/v1, Kind=Deployment: .spec.template.spec.containers[name="httpd"].env: duplicate entries for key [name="A"] These and other errors around it are related to the upstream issue https://github.com/kubernetes/kubernetes/issues/88182 There are other errors for which BZs are already filed for and are begin tracked. server side validation error: https://bugzilla.redhat.com/show_bug.cgi?id=1786269 Metrics group version log spamming: https://bugzilla.redhat.com/show_bug.cgi?id=1819053 Apart from these issues, there are no other errors that would cause 503 as far as I looked. As most of these issues are related to k8s 1.18, moving this to 4.5
Bumping to high priority because this will start to block update CI once [1] lands to make alerting during updates illegal. [1]: https://github.com/openshift/origin/pull/24786
As part of fixing this bug, [1] should be reverted. [1]: https://github.com/openshift/origin/pull/24786/commits/3a9233400053c036838bdbf7f992874b7a0805fd
E0407 06:33:48.878868 1 structuredmerge.go:103] [SHOULD NOT HAPPEN] failed to create typed new object of type apps/v1, Kind=Deployment: .spec.template.spec.containers[name="httpd"].env: duplicate entries for key [name="A"] This is from a early beta feature which is not GA. We have to ignore these errors. I wonder where these requests come from. No controller should use server side apply today. But we will probably have e2e tests for that feature.
Should be fixed with 1.18 rebase due to https://github.com/kubernetes/kubernetes/issues/88182.
Per comment 6, fixing this bug at least requires reverting origin@3a92334000.
Updates team has no special ownership of this test; not clear to me why Jack would be on the hook to revert origin@3a92334000.
Postponing to 4.6. This is about server-side-apply. The feature is not GA, but early better. We will see in 4.6 how it behaves.
Same as comment 11. Waiting for 1.19.
We have rebased to 1.19. This is supposed to be fixed.
From PR https://github.com/openshift/origin/pull/25314, 4.6 already has been re-based bump to kube 1.19-rc.2. Checked the repo, $ git log --date local --pretty="%h %an %cd - %s" | grep 'kube 1.19' d9ca44ba95 Maru Newby Thu Jul 30 02:02:04 2020 - bump(*) to kube 1.19.0-rc.2 Searched 'shouldn't report any alerts in firing state apart from Watchdog' in release-openshift-origin-installer-e2e-azure-4.6 CI tests reported from https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-blocking#release-openshift-origin-installer-e2e-azure-4.6&sort-by-flakiness, there is no Alert KubeAPIErrorBudgetBurn related error in failed tests, most relaated tests were passed. So move the bug Verified.
(In reply to W. Trevor King from comment #9) > Per comment 6, fixing this bug at least requires reverting origin@3a92334000. This is still true, and the revert has still not landed in master: $ git --no-pager log --oneline -G KubeAPIErrorBudgetBurn test/e2e/upgrade 3a92334000 (origin/pr/24786) Ignore KubeAPIErrorBudgetBurn alert
1.19 does not fix the root cause. The root cause is user data. Compare discussion in https://github.com/kubernetes/kubernetes/issues/88182 and https://github.com/kubernetes/kubernetes/pull/88600. The latter only reduces frequency.
Reverting the KubeAPIErrorBudgetBurn alert has been spun off into bug 1878862.
Oops, I meant "Reverting the KubeAPIErrorBudgetBurn ignore".
This is blocked on origin 1.19 rebase.
From comment #22, this bug doesn't involve 'KubeAPIErrorBudgetBurn', will change the bug subject.
Verification: 1. 4.6 already has been re-based bump to kube 1.19-rc.2, we search the keyword 'SHOULD NOT HAPPEN' https://search.ci.openshift.org/?search=SHOULD+NOT+HAPPEN&maxAge=336h&context=1&type=build-log&name=&maxMatches=5&maxBytes=20971520&groupBy=job, list one result here, ... E0910 02:03:11.289096 17 fieldmanager.go:175] [SHOULD NOT HAPPEN] failed to update managedFields for /, Kind=: failed to convert new object (apps/v1, Kind=Deployment) to smd typed: .spec.template.spec.containers[name="httpd"].env: duplicate entries for key [name="A"] E0910 02:03:12.365113 17 fieldmanager.go:175] [SHOULD NOT HAPPEN] failed to update managedFields for /, Kind=: failed to convert new object (/v1, Kind=Pod) to smd typed: .spec.containers[name="httpd"].env: duplicate entries for key [name="A"] ... We can see above ‘SHOULD NOT HAPPEN’ error message per second, not spamming per second, the PR https://github.com/kubernetes/kubernetes/pull/88600 works as expected. 2. In latest build 4.6.0-0.nightly-2020-09-20-184226 which merged https://github.com/openshift/kubernetes PR. $ git log --date local --pretty="%h %an %cd - %s" 4336ff45 | grep '#335 ' 0634471ce54 OpenShift Merge Robot Tue Sep 8 23:43:06 2020 - Merge pull request #335 from sttts/sttts-fix-non-unique-test-env-var-openshift
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196