+++ This bug was initially created as a clone of Bug #1793541 +++ Created from investigation on private clusters in gov cloud type environments. See https://docs.google.com/document/d/1rY5Wklqx8Rvjd-SynXwOvUvqaUotMmNKCqPN0IkTTpc/edit?ts=5e20e2c5# ingress-operator fails without route53 privileges Team: network-edge $ oc logs ingress-operator-5ff98dcb6c-mzh6p -c ingress-operator 2020-01-16T19:29:59.650Z INFO operator.main ingress-operator/start.go:80 using operator namespace {"namespace": "openshift-ingress-operator"} 2020-01-16T19:29:59.659Z ERROR operator.main ingress-operator/start.go:123 failed to create DNS manager {"error": "failed to get cloud credentials from secret /: secrets \"cloud-credentials\" not found" Even if the privateZone and publicZone blocks are removed from the cluster DNS spec, which indicates to the ingress-operator that DNS management should be disabled, it is still fatal. $ oc get dns cluster -oyaml apiVersion: config.openshift.io/v1 kind: DNS metadata: name: cluster ... spec: baseDomain: sjenning.devcluster.openshift.com # start removal privateZone: tags: Name: sjenning-h6n24-int kubernetes.io/cluster/sjenning-h6n24: owned publicZone: id: Z3URY6TWQ91KVV # end removal status: {} In fact, the failure is so early, it happens before the operator creates the ingress ClusterOperator CR, so there is no top level visibility into why ingress is not starting. Ingress failure causes dependent operators authentication, monitoring, and console to be degraded or not available. Proposed solutions If the the privateZone and publicZone blocks are removed from the cluster DNS CR, ingress-operator should not start the DNS manager (the part of the operator that requires all the AWS privileges). In this way, ingress could get away with not needing an IAM user at all. There is a catch here in that the ELB that the wildcard DNS record targets doesn't exist until the ingress operator creates a Service type LoadBalancer that results in the ELB's creation. Thus the out-of-band mapping of the wildcard record to the ELB would need to be day-2. --- Additional comment from Stephen Cuppett on 2020-01-21 15:18:11 UTC --- Setting target release to the active development version (4.4). Fixes, if any, where requested/required for previous versions will result in clones targeting those z-stream releases. --- Additional comment from errata-xmlrpc on 2020-01-24 16:07:35 UTC --- Bug report changed to ON_QA status by Errata System. A QE request has been submitted for advisory RHBA-2019:47983-01 https://errata.devel.redhat.com/advisory/47983 --- Additional comment from Hongan Li on 2020-02-03 08:29:49 UTC --- verified with 4.4.0-0.nightly-2020-02-02-201619 and issue has been fixed. No error after removing the privateZone and publicZone and logs show: $ oc -n openshift-ingress-operator logs ingress-operator-88f857479-tshnc -c ingress-operator | grep -i "fake dns" 2020-02-03T07:38:26.827Z INFO operator.main ingress-operator/start.go:218 using fake DNS provider because no public or private zone is defined in the cluster DNS configuration
verified with 4.3.0-0.nightly-2020-02-09-195913 and the issue has been fixed. remove both public and private zone from dns/cluster and restart ingress-operator, no error reported $ oc -n openshift-ingress-operator logs ingress-operator-6454978cf5-npqf4 -c ingress-operator | grep DNS 2020-02-10T02:52:01.626Z INFO operator.main ingress-operator/start.go:201 using fake DNS provider because no public or private zone is defined in the cluster DNS configuration $ oc get co/ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE ingress 4.3.0-0.nightly-2020-02-09-195913 True False False 20m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0492