Bug 1790704 - [backport 4.2] RouteHealthDegraded: failed to GET route: dial tcp <ip>:443: connect: connection refused because Load IP missing from node iptables rules
Summary: [backport 4.2] RouteHealthDegraded: failed to GET route: dial tcp <ip>:443: c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.2.z
Assignee: Aniket Bhat
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1789583 (view as bug list)
Depends On: 1781763
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-14 00:17 UTC by W. Trevor King
Modified: 2021-04-05 17:46 UTC (History)
28 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1781763
Environment:
Last Closed: 2020-02-12 12:16:16 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sdn pull 90 0 None closed Bug 1790704: proxy: add handler with same ResyncPeriod as shared informer. 2020-10-29 12:28:40 UTC
Red Hat Product Errata RHBA-2020:0395 0 None None None 2020-02-12 12:16:42 UTC

Description W. Trevor King 2020-01-14 00:17:20 UTC
+++ This bug was initially created as a clone of Bug #1781763 +++

+++ This bug was initially created as a clone of Bug #1765280 +++

Description of problem:

The authentication operator will sometimes report the following degraded condition:

    RouteHealthDegraded: failed to GET route: dial tcp <ip>:443: connect: connection refused

Observed on the following platforms in CI over the past 14 days: gcp

The nature of the error (which looks like an external IP) and the fact that it has only been observed on GCP seem like clues.

...

In 4.2.13 -> 4.3.0-rc.0 CI today (also on GCP) [1]:

      {
        "type": "Failing",
        "status": "True",
        "lastTransitionTime": "2020-01-13T13:48:11Z",
        "reason": "ClusterOperatorNotAvailable",
        "message": "Cluster operator authentication is still updating"
      },
      {
        "type": "Progressing",
        "status": "True",
        "lastTransitionTime": "2020-01-13T13:21:48Z",
        "reason": "ClusterOperatorNotAvailable",
        "message": "Unable to apply 4.3.0-rc.0: the cluster operator authentication has not yet successfully rolled out"
      },

with [2]:

  - lastTransitionTime: "2020-01-13T13:33:02Z"
    message: 'RouteHealthDegraded: failed to GET route: dial tcp 34.74.190.39:443:
      connect: connection refused'
    reason: RouteHealthDegradedFailedGet
    status: "True"
    type: Degraded

And at that time the network operator is still running [3]:

  versions:
  - name: operator
    version: 4.2.13

so I guess this still needs to be cloned back to 4.2.z.

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/214
[2]: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/214/artifacts/e2e-gcp-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-d24ac732f2fd86150091410623d388ad78196ad7f8072696e85ceaaccb187759/cluster-scoped-resources/config.openshift.io/clusteroperators/authentication.yaml

Comment 4 Alexander Constantinescu 2020-01-22 14:49:15 UTC
*** Bug 1789583 has been marked as a duplicate of this bug. ***

Comment 5 Weibin Liang 2020-01-22 14:59:16 UTC
No authentication failure found when deploy GCP with 4.4.0-0.nightly-2019-12-13-170401


[root@dhcp-41-193 FILE]# oc get nodes
NAME                                             STATUS   ROLES    AGE   VERSION
qe-wel-m8szx-m-0.c.openshift-qe.internal         Ready    master   25m   v1.14.6+c383847f6
qe-wel-m8szx-m-1.c.openshift-qe.internal         Ready    master   25m   v1.14.6+c383847f6
qe-wel-m8szx-m-2.c.openshift-qe.internal         Ready    master   25m   v1.14.6+c383847f6
qe-wel-m8szx-w-a-kljqn.c.openshift-qe.internal   Ready    worker   14m   v1.14.6+c383847f6
qe-wel-m8szx-w-b-cprzx.c.openshift-qe.internal   Ready    worker   14m   v1.14.6+c383847f6
[root@dhcp-41-193 FILE]# oc get clusteroperator | grep authentication
authentication                             4.2.0-0.nightly-2020-01-22-023656   True        False         False      8m23s
[root@dhcp-41-193 FILE]#

Comment 7 errata-xmlrpc 2020-02-12 12:16:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0395

Comment 8 W. Trevor King 2021-04-05 17:46:31 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475


Note You need to log in before you can comment on or make changes to this bug.