+++ This bug was initially created as a clone of Bug #1765280 +++ Description of problem: The authentication operator will sometimes report the following degraded condition: RouteHealthDegraded: failed to GET route: dial tcp <ip>:443: connect: connection refused Observed on the following platforms in CI over the past 14 days: gcp The nature of the error (which looks like an external IP) and the fact that it has only been observed on GCP seem like clues. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Casey Callendrello on 2019-12-10 14:22:45 UTC --- Fix merged in https://github.com/openshift/openshift-sdn/pull/79. Starting the backport dance. --- Additional comment from Casey Callendrello on 2019-12-10 14:23:38 UTC --- meant https://github.com/openshift/sdn/pull/79
https://github.com/openshift/sdn/pull/81 filed
Deployment succeeded on GCP with 4.3.0-0.nightly-2019-12-12-021332 NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.3.0-0.nightly-2019-12-12-021332 True False False 6h18m
> Deployment succeeded on GCP with 4.3.0-0.nightly-2019-12-12-021332 I'm not clear on what the expected flake-rate for this issue is, but in 4.2.13 -> 4.3.0-rc.0 CI today (also on GCP) [1]: { "type": "Failing", "status": "True", "lastTransitionTime": "2020-01-13T13:48:11Z", "reason": "ClusterOperatorNotAvailable", "message": "Cluster operator authentication is still updating" }, { "type": "Progressing", "status": "True", "lastTransitionTime": "2020-01-13T13:21:48Z", "reason": "ClusterOperatorNotAvailable", "message": "Unable to apply 4.3.0-rc.0: the cluster operator authentication has not yet successfully rolled out" }, with [2]: - lastTransitionTime: "2020-01-13T13:33:02Z" message: 'RouteHealthDegraded: failed to GET route: dial tcp 34.74.190.39:443: connect: connection refused' reason: RouteHealthDegradedFailedGet status: "True" type: Degraded And at that time the network operator is still running [3]: versions: - name: operator version: 4.2.13 so I guess this still needs to be cloned back to 4.2.z? [1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/214 [2]: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/214/artifacts/e2e-gcp-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-d24ac732f2fd86150091410623d388ad78196ad7f8072696e85ceaaccb187759/cluster-scoped-resources/config.openshift.io/clusteroperators/authentication.yaml
I have a try from 4.2.14 --> 4.3.0-rc.0 with GCP cluster, all cluster operator upgraded successfully. oc get clusterversion -o yaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: ClusterVersion metadata: creationTimestamp: "2020-01-15T04:18:18Z" generation: 2 name: version resourceVersion: "146557" selfLink: /apis/config.openshift.io/v1/clusterversions/version uid: 0ef5dd00-374e-11ea-a2ae-42010a000004 spec: channel: stable-4.2 clusterID: 1d856d0b-d98b-453e-92ec-813bec9f78be desiredUpdate: force: true image: quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 version: "" upstream: https://api.openshift.com/api/upgrades_info/v1/graph status: availableUpdates: null conditions: - lastTransitionTime: "2020-01-15T04:36:47Z" message: Done applying 4.3.0-rc.0 status: "True" type: Available - lastTransitionTime: "2020-01-15T11:31:49Z" status: "False" type: Failing - lastTransitionTime: "2020-01-15T11:41:17Z" message: Cluster version is 4.3.0-rc.0 status: "False" type: Progressing - lastTransitionTime: "2020-01-15T04:18:36Z" message: 'Unable to retrieve available updates: currently installed version 4.3.0-rc.0 not found in the "stable-4.2" channel' reason: VersionNotFound status: "False" type: RetrievedUpdates desired: force: true image: quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 version: 4.3.0-rc.0 history: - completionTime: "2020-01-15T11:41:17Z" image: quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 startedTime: "2020-01-15T11:04:32Z" state: Completed verified: false version: 4.3.0-rc.0 - completionTime: "2020-01-15T04:36:47Z" image: quay.io/openshift-release-dev/ocp-release@sha256:3fabe939da31f9a31f509251b9f73d321e367aba2d09ff392c2f452f6433a95a startedTime: "2020-01-15T04:18:36Z" state: Completed verified: false version: 4.2.14 observedGeneration: 2 versionHash: CZiJlh_NjCQ= kind: List metadata: resourceVersion: "" selfLink: "" oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.3.0-rc.0 True False False 7h16m cloud-credential 4.3.0-rc.0 True False False 7h32m cluster-autoscaler 4.3.0-rc.0 True False False 7h22m console 4.3.0-rc.0 True False False 16m dns 4.3.0-rc.0 True False False 7h32m image-registry 4.3.0-rc.0 True False False 23m ingress 4.3.0-rc.0 True False False 21m insights 4.3.0-rc.0 True False False 7h32m kube-apiserver 4.3.0-rc.0 True False False 7h31m kube-controller-manager 4.3.0-rc.0 True False False 7h28m kube-scheduler 4.3.0-rc.0 True False False 7h30m machine-api 4.3.0-rc.0 True False False 7h32m machine-config 4.3.0-rc.0 True False False 7h28m marketplace 4.3.0-rc.0 True False False 15m monitoring 4.3.0-rc.0 True False False 13m network 4.3.0-rc.0 True False False 7h31m node-tuning 4.3.0-rc.0 True False False 21m openshift-apiserver 4.3.0-rc.0 True False False 18m openshift-controller-manager 4.3.0-rc.0 True False False 7h30m openshift-samples 4.3.0-rc.0 True False False 40m operator-lifecycle-manager 4.3.0-rc.0 True False False 7h31m operator-lifecycle-manager-catalog 4.3.0-rc.0 True False False 7h31m operator-lifecycle-manager-packageserver 4.3.0-rc.0 True False False 15m service-ca 4.3.0-rc.0 True False False 7h32m service-catalog-apiserver 4.3.0-rc.0 True False False 7h28m service-catalog-controller-manager 4.3.0-rc.0 True False False 7h24m storage 4.3.0-rc.0 True False False 39m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062