Bug 1765776
| Summary: | OCP 4.2.2 install fails to initialize with authentication and console operators failing to rollout - unable to check route health: failed to GET route | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Walid A. <wabouham> |
| Component: | Networking | Assignee: | Dan Mace <dmace> |
| Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | unspecified | CC: | aos-bugs, mifiedle, wking |
| Version: | 4.2.0 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-26 00:22:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Changing the component to Routing, because the ingress operator sets up the DNSRecords (the DNS operator is about in-cluster CoreDNS resolution, not about managing the out-of-cluster Route 53 records). Already tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1765282. *** This bug has been marked as a duplicate of bug 1765282 *** |
Description of problem: OCP 4.2.2 IPI install on AWS fails after 54 minutes with cluster operators authentication and console still updating. Install logs shows: time="2019-10-25T16:56:15Z" level=fatal msg="failed to initialize the cluster: Working towards 4.2.2: 100% complete" CVO log shows: I1025 19:11:32.016838 1 task_graph.go:611] Result of work: [Cluster operator authentication is still updating Cluster operator console has not yet reported success] I1025 19:11:32.016856 1 sync_worker.go:741] Summarizing 2 errors I1025 19:11:32.016862 1 sync_worker.go:745] Update error 135 of 433: ClusterOperatorNotAvailable Cluster operator authentication is still updating (*errors.errorString: cluster operator authentication is still updating) I1025 19:11:32.016870 1 sync_worker.go:745] Update error 294 of 433: ClusterOperatorNotAvailable Cluster operator console has not yet reported success (*errors.errorString: cluster operator console is not done; it is available=false, progressing=true, degraded=false) E1025 19:11:32.016885 1 sync_worker.go:311] unable to synchronize image (waiting 2m52.525702462s): Some cluster operators are still updating: authentication, console Authentication operator logs show: E1025 19:36:46.938680 1 controller.go:129] {AuthenticationOperator2 AuthenticationOperator2} failed with: error checking current version: unable to check route health: failed to GET route: dial tcp: lookup oauth-openshift.apps.<cluster-domain-name> on 172.30.0.10:53: no such host The Route53 hosted zone for this cluster is missing A record for *.apps. Also: # oc -n openshift-ingress-operator get -o yaml dnsrecord default-wildcard Error from server (NotFound): dnsrecords.ingress.operator.openshift.io "default-wildcard" not found Version-Release number of selected component (if applicable): How reproducible: Reproduced twice so far Steps to Reproduce: 1. AWS IPI install of OCP version 4.2.2 with OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE: quay.io/openshift-release-dev/ocp-release:4.2.2 2. install fails after around 54 minutes by checking install logs 3. `oc get co` shows auth and console operators progressing or degraded Actual results: Install fails after 54 minutes Expected results: Install to succeed and all cluster operators to be available and not progressing/degraded Additional info: links to operator logs and oc adm must-gather logs will be in next comment