4.3 release promotion CI [1]: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.3/513/build-log.txt | grep 'Kube API started failing\|Kube API started responding' | sort | uniq Dec 03 22:13:19.021 E kube-apiserver Kube API started failing: Get https://api.ci-op-jw2mh699-34698.origin-ci-int-gce.dev.openshift.com:6443/api/v1/namespaces/kube-system?timeout=3s: net/http: request canceled (Client.Timeout exceeded while awaiting headers) Dec 03 22:13:19.299 I kube-apiserver Kube API started responding to GET requests My impression was that we want 100% uptime for the Kube API and that even short flaps like that are things we want to fix. But if we think they're actually fine (most clients will survive a sub-second outage and retry if they happen to hit the outage), then we should teach the monitor [2] to consider the duration of outage before complaining to reduce the noise. Currently these outages are very common, showing up in 103 jobs from the past 24h [3] (although most of those outages might also be brief). Spun off from bug 1779413, which has been re-purposed to look at samples. [1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.3/513 [2]: https://github.com/openshift/origin/blob/9d9c044e53d4d27b64f9407f7596ba86a0f78e23/pkg/monitor/api.go#L78-L82 [3]: https://search.svc.ci.openshift.org/chart?search=Kube%20API%20started%20failing.*Client.Timeout%20exceeded%20while%20awaiting%20headers
Bumping-up severity as 10% of tests are failing with this error.
https://github.com/openshift/machine-config-operator/pull/1670 might help. We will revisit this BZ when the PR merged.
https://github.com/openshift/machine-config-operator/pull/1670 merged. Moving to modified.
This seems to be common problem, from below searching results, there are 11 bugs related to this, and matched 16.52% of failing runs and 12.26% of jobs, we don't see a significant decline. I don't think it was resolved by the PR https://github.com/openshift/machine-config-operator/pull/1670, so assign it back. https://search.apps.build01.ci.devcluster.openshift.com/?search=Client.Timeout+exceeded+while+awaiting+headers&maxAge=12h&context=1&type=all&name=&maxMatches=5&maxBytes=20971520&groupBy=job
We are on this topic, but it does not block 4.5 in the current form. Moving to 4.6. Note that many of the runs in the search in #12 are not GCP related. This issue was about GCP only.
We are tracking these in #1845410. *** This bug has been marked as a duplicate of bug 1845410 ***