Bug 1854383 - [Azure] Handling of Ingress operator expired token
Summary: [Azure] Handling of Ingress operator expired token
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.4
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.6.0
Assignee: Miciah Dashiel Butler Masters
QA Contact: Hongan Li
Depends On:
Blocks: 1868257
TreeView+ depends on / blocked
Reported: 2020-07-07 11:16 UTC by Felipe M
Modified: 2020-08-21 08:06 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 425 None closed Bug 1854383: dns: Reread cloud credentials secret if it changes 2020-09-14 06:58:10 UTC

Internal Links: 1868257

Description Felipe M 2020-07-07 11:16:55 UTC
Description of problem:
The ingress operator fails to ensure a DNSRecord due to errors refreshing the token, restarting the operator fixes the issue.

Version-Release number of selected component (if applicable):

How reproducible:
Start the Ingress Operator with a valid token.
Expire/Revoke that token manually or edit the secret to make it fail refreshing.
Operator should get in degraded status and DNSRecord should fail updating.

Steps to Reproduce:

Actual results:
Ingress operator move to degraded state, and start retrying the calls.

Expected results:
1) After n subsequent retries, ingress operator requests a new credential token from the cloud credential operator
2) Operator handlers an authentication error over other errors and request a new credential or restarts itself logging the error.

Additional info:
I will try to get as much information from the client as possible.

Restarting the operator

Comment 2 Andrew McDermott 2020-07-07 16:08:37 UTC
Iā€™m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 5 Miciah Dashiel Butler Masters 2020-07-09 18:58:31 UTC
(In reply to Felipe M from comment #0)
> Description of problem:
> The ingress operator fails to ensure a DNSRecord due to errors refreshing
> the token, restarting the operator fixes the issue.

The ingress operator never requests the credentials itself; rather, the cluster version operator creates the credentials request on behalf of the ingress operator, which only reads the resultant credentials secret.  However, restarting the ingress operator forces it to reread the credentials secret.  From your problem description and workaround, it seems like the cloud credential operator must be updating the secret on its own, so the ingress operator just needs to watch for and react to updates to the secret.

Comment 9 Hongan Li 2020-07-21 02:38:17 UTC
Verified with 4.6.0-0.nightly-2020-07-20-183524 and the issue has been fixed. 

1. oc -n openshift-ingress-operator edit secret/cloud-credentials
apiVersion: v1
  azure_client_id: xxxx           <--- remove this line
  azure_client_secret: xxxx
  azure_region: xxxx

2. oc -n openshift-ingress-operator logs deploy/ingress-operator -c ingress-operator
2020-07-21T02:17:28.075Z	ERROR	operator.init.controller-runtime.controller	controller/controller.go:258	Reconciler error	{"controller": "dns_controller", "request": "openshift-ingress-operator/default-wildcard", "error": "failed to create DNS provider: failed to create Azure DNS manager: failed to create recordSetClient: parameter 'clientID' cannot be empty"}

3. oc -n openshift-ingress-operator delete secret/cloud-credentials

4. oc -n openshift-ingress-operator logs deploy/ingress-operator -c ingress-operator
2020-07-21T02:21:24.401Z	DEBUG	operator.init.controller-runtime.controller	controller/controller.go:282	Successfully Reconciled	{"controller": "dns_controller", "request": "openshift-ingress-operator/default-wildcard"}

Note You need to log in before you can comment on or make changes to this bug.