Bug 1809354
Summary: | 4.3.1 Azure IPI installs show "cloud provider rate limited(read) for operation:NicGet" in openshift-ingress namespace | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Caden Marchese <cmarches> |
Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> |
Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | low | CC: | amcdermo, aos-bugs, dhansen, hongli, mharri, pamoedom, rdomnu |
Version: | 4.3.z | ||
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | x86_64 | ||
OS: | Other | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The ingress operator was continuously upserting DNS records that it managed on Azure and GCP.
Consequence: The cloud-provider API sometimes rate-limited the upsert API calls, causing alarming "cloud provider rate limited" events in the "openshift-ingress" namespace. In addition, the ingress operator's logs showed repeated "upserted DNS record" log messages.
Fix: Logic was added to the ingress operator's DNS controller to avoid upserting a DNS record if it is already published and neither the record nor the DNS zone configuration has changed since the controller last upserted the record.
Result: The ingress operator makes fewer calls to the cloud-provider API, the operator's logs show fewer "upserted DNS record" log messages, and the operator should not cause "cloud provider rate limited" events.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-13 17:17:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Caden Marchese
2020-03-02 22:38:17 UTC
We've seen this from the k8s cloud provider since day one with Azure, so it's not a blocking regression, and is always transient. Disabling rate limiting as a fix seems suspicious, though. Moving to 4.5. There may be an old bug of which this is a duplicate. Pedro, The ingress operator periodically ensures the default ingresscontroller is present. You should be able to recreate the default-wildcard DNSRecord by deleting ingresscontroller/default: $ oc delete ingresscontroller/default -n openshift-ingress-operator ingresscontroller.operator.openshift.io "default" deleted Wait a minute. $ oc get ingresscontroller/default -n openshift-ingress-operator NAME AGE default 2m20s $ oc get dnsrecord/default-wildcard -n openshift-ingress-operator -o json|jq ".status.zones[0].conditions[0]" { "lastTransitionTime": "2020-04-30T19:28:15Z", "message": "The DNS provider succeeded in ensuring the record", "reason": "ProviderSuccess", "status": "False", "type": "Failed" } Pedro, After looking into this issue further, here's the proper way to fix your issue: # Remove the finalizer: $ oc patch -n openshift-ingress-operator dnsrecord/default-wildcard --patch '{"metadata":{"finalizers": []}}' --type=merge dnsrecord.ingress.operator.openshift.io/default-wildcard patched # Delete the stuck dnsrecord $ oc delete dnsrecord/default-wildcard -n openshift-ingress-operator dnsrecord.ingress.operator.openshift.io "default-wildcard" deleted # Verify the record has been recreated. $ oc get dnsrecord/default-wildcard -n openshift-ingress-operator NAME AGE default-wildcard 4s # Verify that status of co/ingress: $ oc get co/ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE ingress 4.5.0-0.nightly-2020-04-03-194832 True False False 6h1m A dnsrecord record is only created during ingresscontroller creation [0]. [0] https://github.com/openshift/api/blob/master/operator/v1/types_ingress.go#L268-L274 Thanks Daneyon, we'll try that and get back to you ASAP to confirm the workaround. Best Regards. Hi again Daneyon, please also note that we have found related BZ#1782516 which already contains a PR[1] which seems to basically deactivate "CloudProviderRateLimit", right? Please note that when the patch is succesfully merged to 4.5, we'll need a proper 4.3 backport, thanks. NOTE: I will also link our case with that BZ, we can continue there if you prefer. [1] - https://github.com/openshift/installer/pull/3259 Best Regards. [UPDATE] Please disregard the 4.3 backport comment, I can see that BZ#1826073 is already in place for that, thanks. *** Bug 1837324 has been marked as a duplicate of this bug. *** Verified with 4.5.0-0.nightly-2020-05-27-202943 and issue has been fixed. didn't see any "rate limited" events or the problem in https://bugzilla.redhat.com/show_bug.cgi?id=1837324. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |