Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1790704

Summary: [backport 4.2] RouteHealthDegraded: failed to GET route: dial tcp <ip>:443: connect: connection refused because Load IP missing from node iptables rules
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: NetworkingAssignee: Aniket Bhat <anbhat>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: aconstan, adam.kaplan, anbhat, aos-bugs, bbennett, bleanhar, bpeterse, ccoleman, cdc, deads, dmace, dmoessne, jchaloup, jiajliu, jlebon, kgarriso, lsm5, mfojtik, obulatov, pmuller, rbrattai, sdodson, spadgett, weliang, wking, wsun, yinzhou, zzhao
Version: 4.2.z   
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1781763 Environment:
Last Closed: 2020-02-12 12:16:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1781763    
Bug Blocks:    

Description W. Trevor King 2020-01-14 00:17:20 UTC
+++ This bug was initially created as a clone of Bug #1781763 +++

+++ This bug was initially created as a clone of Bug #1765280 +++

Description of problem:

The authentication operator will sometimes report the following degraded condition:

    RouteHealthDegraded: failed to GET route: dial tcp <ip>:443: connect: connection refused

Observed on the following platforms in CI over the past 14 days: gcp

The nature of the error (which looks like an external IP) and the fact that it has only been observed on GCP seem like clues.

...

In 4.2.13 -> 4.3.0-rc.0 CI today (also on GCP) [1]:

      {
        "type": "Failing",
        "status": "True",
        "lastTransitionTime": "2020-01-13T13:48:11Z",
        "reason": "ClusterOperatorNotAvailable",
        "message": "Cluster operator authentication is still updating"
      },
      {
        "type": "Progressing",
        "status": "True",
        "lastTransitionTime": "2020-01-13T13:21:48Z",
        "reason": "ClusterOperatorNotAvailable",
        "message": "Unable to apply 4.3.0-rc.0: the cluster operator authentication has not yet successfully rolled out"
      },

with [2]:

  - lastTransitionTime: "2020-01-13T13:33:02Z"
    message: 'RouteHealthDegraded: failed to GET route: dial tcp 34.74.190.39:443:
      connect: connection refused'
    reason: RouteHealthDegradedFailedGet
    status: "True"
    type: Degraded

And at that time the network operator is still running [3]:

  versions:
  - name: operator
    version: 4.2.13

so I guess this still needs to be cloned back to 4.2.z.

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/214
[2]: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/214/artifacts/e2e-gcp-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-d24ac732f2fd86150091410623d388ad78196ad7f8072696e85ceaaccb187759/cluster-scoped-resources/config.openshift.io/clusteroperators/authentication.yaml

Comment 4 Alexander Constantinescu 2020-01-22 14:49:15 UTC
*** Bug 1789583 has been marked as a duplicate of this bug. ***

Comment 5 Weibin Liang 2020-01-22 14:59:16 UTC
No authentication failure found when deploy GCP with 4.4.0-0.nightly-2019-12-13-170401


[root@dhcp-41-193 FILE]# oc get nodes
NAME                                             STATUS   ROLES    AGE   VERSION
qe-wel-m8szx-m-0.c.openshift-qe.internal         Ready    master   25m   v1.14.6+c383847f6
qe-wel-m8szx-m-1.c.openshift-qe.internal         Ready    master   25m   v1.14.6+c383847f6
qe-wel-m8szx-m-2.c.openshift-qe.internal         Ready    master   25m   v1.14.6+c383847f6
qe-wel-m8szx-w-a-kljqn.c.openshift-qe.internal   Ready    worker   14m   v1.14.6+c383847f6
qe-wel-m8szx-w-b-cprzx.c.openshift-qe.internal   Ready    worker   14m   v1.14.6+c383847f6
[root@dhcp-41-193 FILE]# oc get clusteroperator | grep authentication
authentication                             4.2.0-0.nightly-2020-01-22-023656   True        False         False      8m23s
[root@dhcp-41-193 FILE]#

Comment 7 errata-xmlrpc 2020-02-12 12:16:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0395

Comment 8 W. Trevor King 2021-04-05 17:46:31 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475