Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1765776

Summary:	OCP 4.2.2 install fails to initialize with authentication and console operators failing to rollout - unable to check route health: failed to GET route
Product:	OpenShift Container Platform	Reporter:	Walid A. <wabouham>
Component:	Networking	Assignee:	Dan Mace <dmace>
Networking sub component:	router	QA Contact:	Hongan Li <hongli>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	urgent
Priority:	unspecified	CC:	aos-bugs, mifiedle, wking
Version:	4.2.0
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-10-26 00:22:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Walid A. 2019-10-25 23:29:22 UTC

Description of problem:
OCP 4.2.2 IPI install on AWS fails after 54 minutes with cluster operators authentication and console still updating.  

Install logs shows:
time="2019-10-25T16:56:15Z" level=fatal msg="failed to initialize the cluster: Working towards 4.2.2: 100% complete"

CVO log shows:
I1025 19:11:32.016838       1 task_graph.go:611] Result of work: [Cluster operator authentication is still updating Cluster operator console has not yet reported success]
I1025 19:11:32.016856       1 sync_worker.go:741] Summarizing 2 errors
I1025 19:11:32.016862       1 sync_worker.go:745] Update error 135 of 433: ClusterOperatorNotAvailable Cluster operator authentication is still updating (*errors.errorString: cluster operator authentication is still updating)
I1025 19:11:32.016870       1 sync_worker.go:745] Update error 294 of 433: ClusterOperatorNotAvailable Cluster operator console has not yet reported success (*errors.errorString: cluster operator console is not done; it is available=false, progressing=true, degraded=false)
E1025 19:11:32.016885       1 sync_worker.go:311] unable to synchronize image (waiting 2m52.525702462s): Some cluster operators are still updating: authentication, console

Authentication operator logs show:

E1025 19:36:46.938680       1 controller.go:129] {AuthenticationOperator2 AuthenticationOperator2} failed with: error checking current version: unable to check route health: failed to GET route: dial tcp: lookup oauth-openshift.apps.<cluster-domain-name> on 172.30.0.10:53: no such host


The Route53 hosted zone for this cluster is missing A record for *.apps.

Also:

# oc -n openshift-ingress-operator get -o yaml dnsrecord default-wildcard
Error from server (NotFound): dnsrecords.ingress.operator.openshift.io "default-wildcard" not found

Version-Release number of selected component (if applicable):


How reproducible:

Reproduced twice so far

Steps to Reproduce:
1. AWS IPI install of OCP version 4.2.2 with OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE: quay.io/openshift-release-dev/ocp-release:4.2.2

2. install fails after around 54 minutes by checking install logs  
3. `oc get co` shows auth and console operators progressing or degraded

Actual results:

Install fails after 54 minutes 
Expected results:
Install to succeed and all cluster operators to be available and not progressing/degraded

Additional info:
links to operator logs and oc adm must-gather logs will be in next comment

Comment 2 W. Trevor King 2019-10-25 23:41:01 UTC

Changing the component to Routing, because the ingress operator sets up the DNSRecords (the DNS operator is about in-cluster CoreDNS resolution, not about managing the out-of-cluster Route 53 records).

Comment 3 Dan Mace 2019-10-26 00:22:06 UTC

Already tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1765282.

*** This bug has been marked as a duplicate of bug 1765282 ***