+++ This bug was initially created as a clone of Bug #1929012 +++ Description of problem: A frequent flake in CI happens around API priority. There are two cases that seem related that fail periodically: openshift-tests.[sig-api-machinery] API priority and fairness should ensure that requests can be classified by testing flow-schemas/priority-levels [Suite:openshift/conformance/parallel] [Suite:k8s] you can see it in this job: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-blocking#release-openshift-origin-installer-e2e-gcp-4.7 sometimes the test will fail the first time and succeed the second time so our job would end up passing, but it does fail back to back sometimes which marks the job failed. It seems to be showing up in ~5% of failing jobs going by this search: https://search.ci.openshift.org/?search=API+priority+and+fairness+should+ensure+that+requests+can+be+classified+by+testing+flow-schemas%2Fpriority-levels&maxAge=168h&context=1&type=junit&name=4.7&maxMatches=5&maxBytes=20971520&groupBy=job I don't know the root cause, but looking at this job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.7/1361412835921367040 and taking it's must-gather: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.7/1361412835921367040/artifacts/e2e-gcp/must-gather.tar and searching during the time frame of the first failure "21:21:22" I found this log snippet: 2021-02-15T21:21:22.730001901Z I0215 21:21:22.729862 19 trace.go:205] Trace[179810853]: "Create" url:/apis/rbac.authorization.k8s.io/v1/namespaces/e2e-apf-8687/rolebindings,us er-agent:openshift-controller-manager/v0.0.0 (linux/amd64) kubernetes/$Format/system:serviceaccount:openshift-infra:default-rolebindings-controller,client:10.0.0.5 (15-Feb-2021 21: 21:22.215) (total time: 514ms): 2021-02-15T21:21:22.730001901Z Trace[179810853]: ---"Object stored in database" 514ms (21:21:00.729) 2021-02-15T21:21:22.730001901Z Trace[179810853]: [514.715556ms] [514.715556ms] END 2021-02-15T21:21:25.313279055Z E0215 21:21:25.313135 19 wrap.go:54] timeout or abort while handling: GET "/apis/oauth.openshift.io/v1/oauthclients" I think that would be the request that has trouble and the "timeout or abort" message might be a clue. That log file is: namespaces/openshift-kube-apiserver/pods/kube-apiserver-ci-op-zr3hl6j2-c38ab-8njmq-master-0/kube-apiserver/kube-apiserver/logs/current.log I didn't get much further than that. --- Additional comment from Michal Fojtik on 2021-02-16 08:29:12 UTC --- Looks like this is already tracked upstream: https://github.com/kubernetes/kubernetes/issues/96803 and fixed in https://github.com/kubernetes/kubernetes/pull/96984 As this is only test flake, setting severity appropriately.
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.
Was told that this is fixed in the rebase, this is still flaking
The LifecycleStale keyword was removed because the bug got commented on recently. The bug assignee was notified.
I'm bumping this to high, since this has to be fixed for 4.8 right after we land https://github.com/openshift/origin/pull/26054
https://github.com/openshift/origin/pull/26056 didn't merge so the test wasn't disabled. Closing.