Hide Forgot
Description of problem: Recent e2e-agnostic-upgrade in MCO repo has been failing due to some mandatory test failing, such as disruption_tests: [sig-api-machinery] Kubernetes APIs remain available for new connections """ Aug 23 16:53:37.649 E kube-apiserver-new-connection kube-apiserver-new-connection started failing: Get "https://api.ci-op-3gxydq9m-57c36.ci.azure.devcluster.openshift.com:6443/api/v1/namespaces/default": dial tcp 20.106.10.105:6443: i/o timeout Aug 23 16:53:37.649 - 15s E kube-apiserver-new-connection kube-apiserver-new-connection is not responding to GET requests Aug 23 16:53:52.649 I kube-apiserver-new-connection kube-apiserver-new-connection started responding to GET requests github.com/openshift/origin/test/extended/util/disruption/controlplane.(*availableTest).Test(0xc001d3ad20, 0xc001c28dc0, 0xc002d6d500, 0x2) github.com/openshift/origin/test/extended/util/disruption/controlplane/controlplane.go:127 +0x528 github.com/openshift/origin/test/extended/util/disruption.(*chaosMonkeyAdapter).Test(0xc001dc11d0, 0xc0018ada28) github.com/openshift/origin/test/extended/util/disruption/disruption.go:190 +0x3be k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do.func1(0xc0018ada28, 0xc00168ef70) k8s.io/kubernetes.0-rc.0/test/e2e/chaosmonkey/chaosmonkey.go:90 +0x6d created by k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do k8s.io/kubernetes.0-rc.0/test/e2e/chaosmonkey/chaosmonkey.go:87 +0xc9 """ There are few other disruption_tests failing as well which could be related: - disruption_tests: [sig-api-machinery] Kubernetes APIs remain available with reused connections expand_less - disruption_tests: [sig-api-machinery] OpenShift APIs remain available for new connections - disruption_tests: [sig-api-machinery] OAuth APIs remain available for new connections Few CI job links from MCO PRs: - https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2704/pull-ci-openshift-machine-config-operator-master-e2e-agnostic-upgrade/1429824441595990016 - https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2706/pull-ci-openshift-machine-config-operator-master-e2e-agnostic-upgrade/1429821059154055168
possibly a dup of bug 1955333? Certainly in the same Azure + Kube-reachability space.
e2e-agnostic-* jobs could run on any platform. But for the MCO, they're currently Azure [1]. And Kube-reachability issues are often platform-specific, involving pod-restart logic vs. platform-specific load balancer implementation. So tweaking the title here to include "Azure". [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2704/pull-ci-openshift-machine-config-operator-master-e2e-agnostic-upgrade/1429824441595990016#1:build-log.txt%3A19
Setting priority to high because upgrade job is blocking on MCO PRs and as a result most of the PRs are not getting merged.
(In reply to W. Trevor King from comment #2) > e2e-agnostic-* jobs could run on any platform. But for the MCO, they're > currently Azure [1]. And Kube-reachability issues are often > platform-specific, involving pod-restart logic vs. platform-specific load > balancer implementation. So tweaking the title here to include "Azure". Not sure if this is just azure specific issue. I later on started an upgrade test on gcp https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2722/pull-ci-openshift-machine-config-operator-master-e2e-gcp-upgrade/1430172245858193408 where these tests failed too.
Duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1845414. Until there are very specific new insights about the root causes, there is no value in new BZs. There are a thousand different reasons why the API can go unavailable for some time, in many components like kube-apiserver itself, but also node, MCO, cri-o and the cloud infra. I don't see a triage attempt in this BZ to point to one of those. *** This bug has been marked as a duplicate of bug 1845414 ***