Hide Forgot
+++ This bug was initially created as a clone of Bug #1845414 +++ +++ This bug was initially created as a clone of Bug #1845412 +++ We see the Kube API to be unavailable during upgrades on Azure. This is not supposed to happen if graceful termination and LB endpoint reconcialation by the cloud provider work correctly. Note: openshift-apiserver APIs are unavailable to if the kube-apiserver is not serving correctly. This is an umbrella bug, cloned into releases and closed when we are happy with the upgrade stability. --- Additional comment from W. Trevor King on 2020-06-09 19:41:16 UTC --- Search [1]. Example 4.6 CI job [2] failed on: [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial] [Suite:openshift] fail [github.com/openshift/origin/test/extended/util/disruption/disruption.go:237]: Jun 8 11:25:56.402: API was unreachable during disruption for at least 5m52s of 35m41s (16%):... Update JUnit [3] failed the underlying checks: Kubernetes APIs remain available Jun 8 11:25:56.402: API was unreachable during disruption for at least 5m52s of 35m41s (16%): Jun 08 10:57:49.398 E kube-apiserver Kube API started failing: Get https://api.ci-op-jkk4yywi-d89b2.ci.azure.devcluster.openshift.com:6443/api/v1/namespaces/kube-system?timeout=15s: net/http: request canceled (Client.Timeout exceeded while awaiting headers) Jun 08 10:57:50.398 - 102s E kube-apiserver Kube API is not responding to GET requests Jun 08 10:59:33.380 I kube-apiserver Kube API started responding to GET requests Jun 08 11:03:35.398 E kube-apiserver Kube API started failing: Get https://api.ci-op-jkk4yywi-d89b2.ci.azure.devcluster.openshift.com:6443/api/v1/namespaces/kube-system?timeout=15s: net/http: request canceled (Client.Timeout exceeded while awaiting headers) Jun 08 11:03:36.135 I kube-apiserver Kube API started responding to GET requests Jun 08 11:17:25.398 E kube-apiserver Kube API started failing: Get https://api.ci-op-jkk4yywi-d89b2.ci.azure.devcluster.openshift.com:6443/api/v1/namespaces/kube-system?timeout=15s: context deadline exceeded Jun 08 11:17:26.398 - 247s E kube-apiserver Kube API is not responding to GET requests Jun 08 11:21:33.472 I kube-apiserver Kube API started responding to GET requests OpenShift APIs remain available Jun 8 11:25:56.402: API was unreachable during disruption for at least 5m48s of 35m40s (16%): Jun 08 10:57:49.582 I openshift-apiserver OpenShift API stopped responding to GET requests: Get https://api.ci-op-jkk4yywi-d89b2.ci.azure.devcluster.openshift.com:6443/apis/image.openshift.io/v1/namespaces/openshift-apiserver/imagestreams/missing?timeout=15s: context deadline exceeded Jun 08 10:57:50.581 - 102s E openshift-apiserver OpenShift API is not responding to GET requests Jun 08 10:59:33.408 I openshift-apiserver OpenShift API started responding to GET requests Jun 08 11:03:35.582 I openshift-apiserver OpenShift API stopped responding to GET requests: Get https://api.ci-op-jkk4yywi-d89b2.ci.azure.devcluster.openshift.com:6443/apis/image.openshift.io/v1/namespaces/openshift-apiserver/imagestreams/missing?timeout=15s: context deadline exceeded (Client.Timeout exceeded while awaiting headers) Jun 08 11:03:36.139 I openshift-apiserver OpenShift API started responding to GET requests Jun 08 11:17:25.582 I openshift-apiserver OpenShift API stopped responding to GET requests: Get https://api.ci-op-jkk4yywi-d89b2.ci.azure.devcluster.openshift.com:6443/apis/image.openshift.io/v1/namespaces/openshift-apiserver/imagestreams/missing?timeout=15s: net/http: request canceled (Client.Timeout exceeded while awaiting headers) Jun 08 11:17:26.581 - 246s E openshift-apiserver OpenShift API is not responding to GET requests Jun 08 11:21:33.481 I openshift-apiserver OpenShift API started responding to GET requests [1]: https://search.svc.ci.openshift.org/?name=release-openshift-.*azure.*upgrade&search=API%20was%20unreachable%20during%20disruption%20for%20at%20least [2]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.6/300 [3]: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.6/300/artifacts/e2e-azure-upgrade/junit/junit_upgrade_1591615873.xml --- Additional comment from Stefan Schimanski on 2020-06-18 11:39:49 UTC --- Work in progress. --- Additional comment from Michal Fojtik on 2020-07-09 12:46:07 UTC --- Stefan is PTO, adding UpcomingSprint to his bugs to fulfill the duty. --- Additional comment from Stefan Schimanski on 2020-08-03 11:24:19 UTC --- WIP.
We are seeing the Kubernetes + Openshift APIs being impacted in 4.4 -> 4.5 upgrade tests on Azure: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.4-stable-to-4.5-ci/1293901964882481152 # Kubernetes APIs remain available API was unreachable during disruption for at least 22s of 51m44s (1%): # OpenShift APIs remain available API was unreachable during disruption for at least 8s of 51m44s (0%): https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.4-stable-to-4.5-ci/1293720254274342912 # Kubernetes APIs remain available API was unreachable during disruption for at least 28s of 53m37s (1%): # OpenShift APIs remain available API was unreachable during disruption for at least 17s of 53m37s (1%): https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.4-stable-to-4.5-ci/1293629390785089536 # Kubernetes APIs remain available API was unreachable during disruption for at least 3m49s of 55m59s (7%): # OpenShift APIs remain available API was unreachable during disruption for at least 3m50s of 55m59s (7%):
Umbrella bugs are used to collect different issues in one root. Don't clone them into releases. We have the umbrellas for a reason. N copies make it even harder to keep an overview of a number of different root-causes for the same symptoms. We backport fixes into older releases if they are feasible. We have clones for that. *** This bug has been marked as a duplicate of bug 1845414 ***