Description of problem: The ingress operator fails to ensure a DNSRecord due to errors refreshing the token, restarting the operator fixes the issue. Version-Release number of selected component (if applicable): 4.4 How reproducible: Start the Ingress Operator with a valid token. Expire/Revoke that token manually or edit the secret to make it fail refreshing. Operator should get in degraded status and DNSRecord should fail updating. Steps to Reproduce: -- Actual results: Ingress operator move to degraded state, and start retrying the calls. Expected results: 1) After n subsequent retries, ingress operator requests a new credential token from the cloud credential operator 2) Operator handlers an authentication error over other errors and request a new credential or restarts itself logging the error. Additional info: I will try to get as much information from the client as possible. Workaround: Restarting the operator
Iām adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.
(In reply to Felipe M from comment #0) > Description of problem: > The ingress operator fails to ensure a DNSRecord due to errors refreshing > the token, restarting the operator fixes the issue. The ingress operator never requests the credentials itself; rather, the cluster version operator creates the credentials request on behalf of the ingress operator, which only reads the resultant credentials secret. However, restarting the ingress operator forces it to reread the credentials secret. From your problem description and workaround, it seems like the cloud credential operator must be updating the secret on its own, so the ingress operator just needs to watch for and react to updates to the secret.
Verified with 4.6.0-0.nightly-2020-07-20-183524 and the issue has been fixed. 1. oc -n openshift-ingress-operator edit secret/cloud-credentials apiVersion: v1 data: azure_client_id: xxxx <--- remove this line azure_client_secret: xxxx azure_region: xxxx 2. oc -n openshift-ingress-operator logs deploy/ingress-operator -c ingress-operator 2020-07-21T02:17:28.075Z ERROR operator.init.controller-runtime.controller controller/controller.go:258 Reconciler error {"controller": "dns_controller", "request": "openshift-ingress-operator/default-wildcard", "error": "failed to create DNS provider: failed to create Azure DNS manager: failed to create recordSetClient: parameter 'clientID' cannot be empty"} 3. oc -n openshift-ingress-operator delete secret/cloud-credentials 4. oc -n openshift-ingress-operator logs deploy/ingress-operator -c ingress-operator 2020-07-21T02:21:24.401Z DEBUG operator.init.controller-runtime.controller controller/controller.go:282 Successfully Reconciled {"controller": "dns_controller", "request": "openshift-ingress-operator/default-wildcard"}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196
(In reply to errata-xmlrpc from comment #13) > Since the problem described in this bug report should be > resolved in a recent advisory, it has been closed with a > resolution of ERRATA. > > For information on the advisory (OpenShift Container Platform 4.6 GA > Images), and where to find the updated > files, follow the link below. > > If the solution does not work for you, open a new bug report. > https://scdownloader.io > https://access.redhat.com/errata/RHBA-2020:4196 This is great to hear! Thank you for letting us know.