Description of problem: From [1]: $ curl -s https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/706 | grep 'clusteroperator/kube-scheduler changed Degraded to True' | head -n1 | sed 's|\\n|\n|g' Apr 24 19:25:24.484 E clusteroperator/kube-scheduler changed Degraded to True: StaticPodsDegradedError: StaticPodsDegraded: nodes/ip-10-0-140-230.ec2.internal pods/openshift-kube-scheduler-ip-10-0-140-230.ec2.internal container="scheduler" is not ready StaticPodsDegraded: nodes/ip-10-0-140-230.ec2.internal pods/openshift-kube-scheduler-ip-10-0-140-230.ec2.internal container="scheduler" is terminated: "Error" - "/localhost:6443/api/v1/namespaces/openshift-monitoring/pods/kube-state-metrics-5cb588685f-696cx: dial tcp [::1]:6443: connect: connection refused; retrying... E0424 19:19:52.196464 1 factory.go:1570] Error getting pod e2e-tests-sig-apps-replicaset-upgrade-9jwhx/rs-2dvsn for retry: Get https://localhost:6443/api/v1/namespaces/e2e-tests-sig-apps-replicaset-upgrade-9jwhx/pods/rs-2dvsn: dial tcp [::1]:6443: connect: connection refused; retrying... E0424 19:19:52.196846 1 factory.go:1570] Error getting pod openshift-monitoring/grafana-6c56d45755-zjslh for retry: Get https://localhost:6443/api/v1/namespaces/openshift-monitoring/pods/grafana-6c56d45755-zjslh: dial tcp [::1]:6443: connect: connection refused; retrying... E0424 19:19:52.250882 1 factory.go:1570] Error getting pod openshift-console/downloads-8df7b68d5-gkllb for retry: Get https://localhost:6443/api/v1/namespaces/openshift-console/pods/downloads-8df7b68d5-gkllb: dial tcp [::1]:6443: connect: connection refused; retrying... E0424 19:19:52.264865 1 factory.go:1570] Error getting pod openshift-image-registry/cluster-image-registry-operator-f5d964df5-6jtcz for retry: Get https://localhost:6443/api/v1/namespaces/openshift-image-registry/pods/cluster-image-registry-operator-f5d964df5-6jtcz: dial tcp [::1]:6443: connect: connection refused; retrying... E0424 19:19:52.458056 1 factory.go:1570] Error getting pod openshift-dns-operator/dns-operator-54b4d748bf-gx4dw for retry: Get https://localhost:6443/api/v1/namespaces/openshift-dns-operator/pods/dns-operator-54b4d748bf-gx4dw: dial tcp [::1]:6443: connect: connection refused; retrying... E0424 19:19:52.470869 1 factory.go:1570] Error getting pod openshift-monitoring/cluster-monitoring-operator-56cd5488d8-6p44h for retry: Get https://localhost:6443/api/v1/namespaces/openshift-monitoring/pods/cluster-monitoring-operator-56cd5488d8-6p44h: dial tcp [::1]:6443: connect: connection refused; retrying... I0424 19:19:52.491125 1 secure_serving.go:180] Stopped listening on [::]:10251 " StaticPodsDegraded: nodes/ip-10-0-175-155.ec2.internal pods/openshift-kube-scheduler-ip-10-0-175-155.ec2.internal container="scheduler" is not ready Michal feels like this is probably part of the local Kubernetes API-server upgrading, and that we want the local scheduler to release its leadership when that happens. But crashing a Pod is a somewhat noisy way to hand off. And setting your ClusterOperator Degraded is not something that should happen as part of a vanilla upgrade. Can we only complain if we go more than $minutes without a backup scheduler? I dunno if the underlying operator libraries expose "you're the leader, and there were $x other Pods participating in the last election" information to their callers? Or we can solve this another way, as long as it doesn't involve going Degraded during each upgrade ;). [1]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/706
What if we use the internal LB name `api-int` to connect to the apiserver. Is there a reason we are connecting to the local master over localhost? Other than improved latency maybe?
Not a blocker due to low severity but could you look into this Ravi?
A lot has changed in between when this was opened and now, moving to qa for verification against the current release.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581