Bug 1756302
| Summary: | Failed DNS updates causes recursive data accumulation leading to update failure | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Dan Mace <dmace> | |
| Component: | Networking | Assignee: | Dan Mace <dmace> | |
| Networking sub component: | router | QA Contact: | Hongan Li <hongli> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | urgent | |||
| Priority: | urgent | CC: | aos-bugs, bbennett | |
| Version: | 4.2.0 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.3.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1756303 (view as bug list) | Environment: | ||
| Last Closed: | 2020-05-13 21:25:37 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1756303 | |||
verified with 4.3.0-0.ci-2019-10-10-225759 and issue has been fixed.
1. `oc edit dns` and change privateZone's tag
2. delete ingress operator pod to force it restart
3. `oc get dnsrecord -o yaml -n openshift-ingress-operator`
<---snip--->
status:
zones:
- conditions:
- lastTransitionTime: "2019-10-11T09:57:07Z"
message: 'The DNS provider failed to ensure the record: failed to find hosted
zone for record: no matching hosted zone found'
reason: ProviderError
status: "True"
type: Failed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062 |
Description of problem: From Trevor King: UX bug: when the configured private zone cannot be found (e.g. because you try to identify it with the wrong kubernetes.io/cluster/{id} tag, https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5196/rehearse-5196-pull-ci-openshift-installer-master-e2e-aws-upi/3/artifacts/e2e-aws-upi/must-gather/cluster-scoped-resources/config.openshift.io/dnses.yaml ), currently the ingress operator seems to go into some sort of error-accumulating death spiral, ending up with https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5196/rehearse-5196-pull-ci-openshift-installer-master-e2e-aws-upi/3/artifacts/e2e-aws-upi/must-gather/namespaces/openshift-ingress-operator/pods/ingress-operator-595cbc88bd-rg8g4/ingress-operator/ingress-operator/logs/current.log which has a few gigantic lines: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5196/rehearse-5196-pull-ci-openshift-installer-master-e2e-aws-upi/3/artifacts/e2e-aws-upi/must-gather/namespaces/openshift-ingress-operator/pods/ingress-operator-595cbc88bd-rg8g4/ingress-operator/ingress-operator/logs/current.log | wc 36 3216009 50502408 and ends with ... "error": "etcdserver: request is too large"}. We probably want to truncate that error accumulation, and maybe only return the most recent error for a given DNS entry (currently the last error message that still fit into etcd seems to be reflected in https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5196/rehearse-5196-pull-ci-openshift-installer-master-e2e-aws-upi/3/artifacts/e2e-aws-upi/must-gather/namespaces/openshift-ingress-operator/ingress.operator.openshift.io/dnsrecords/default-wildcard.yaml ). Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: