Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1707510

Summary: Install failed: unable to check route health: failed to GET route: dial tcp: lookup [...]: no such host
Product: OpenShift Container Platform Reporter: Samuel Padgett <spadgett>
Component: NetworkingAssignee: Dan Mace <dmace>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, bbennett, nagrawal, wking
Version: 4.1.0   
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-07 19:12:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Samuel Padgett 2019-05-07 16:52:08 UTC
cluster-authentication-operator reporting degraded due to

error checking current version: unable to check route health: failed to GET route: dial tcp: lookup oauth-openshift.apps.ci-op-3gbj403q-c4a31.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host

Comment 3 Dan Mace 2019-05-07 17:51:44 UTC
Actually, looks like an issue w/ LB provisioning. Moving back to Routing.

Comment 5 Dan Mace 2019-05-07 17:58:19 UTC
Every cited CI run is the same AWS LoadBalancer quota issue. So, this isn't a routing bug.

Comment 6 W. Trevor King 2019-05-07 18:20:56 UTC
> Every cited CI run is the same AWS LoadBalancer quota issue.

I'm reaping leaked AWS resources, which should help with this.  But it would be nice to have an extended inability to create a load-balancer get bubbled up into a Degraded status.  The ingress operator should be able to monitor this and set Degraded if its LoadBalancer request remained unfulfilled for $TOO_LONG.  And it could watch for Error/Warning Events in the openshift-ingress namespace to get the reason from the Service controller.

Comment 7 W. Trevor King 2019-05-07 18:23:16 UTC
As it stands, I don't think:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/716/pull-ci-openshift-machine-config-operator-master-e2e-aws/3512/artifacts/e2e-aws/clusteroperators.json | jq '.items[] | select(.metadata.name == "ingress").status.conditions'
[
  {
    "lastTransitionTime": "2019-05-07T15:58:56Z",
    "message": "operand namespace exists",
    "status": "False",
    "type": "Degraded"
  },
  {
    "lastTransitionTime": "2019-05-07T15:59:44Z",
    "message": "desired and current number of IngressControllers are equal",
    "status": "False",
    "type": "Progressing"
  },
  {
    "lastTransitionTime": "2019-05-07T15:59:43Z",
    "message": "desired and current number of IngressControllers are equal",
    "status": "True",
    "type": "Available"
  }
]

is an accurate summary for an ingress operator without a fulfilled LoadBalancer Service.

Comment 8 Dan Mace 2019-05-07 18:25:23 UTC
Agree that we need to improve the conditions in this case. During this, Miciah also realized we haven't implemented our declared API around DNS status:

https://github.com/openshift/api/blob/master/operator/v1/types_ingress.go#L253-L275

Comment 9 Ben Bennett 2019-05-07 19:12:41 UTC
Given that this was caused by AWS quota issues and https://bugzilla.redhat.com/show_bug.cgi?id=1707545 tracks the fix to report the status properly, I'm closing this.

*** This bug has been marked as a duplicate of bug 1707545 ***