Bug 1756302 - Failed DNS updates causes recursive data accumulation leading to update failure
Summary: Failed DNS updates causes recursive data accumulation leading to update failure
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.3.0
Assignee: Dan Mace
QA Contact: Hongan Li
Depends On:
Blocks: 1756303
TreeView+ depends on / blocked
Reported: 2019-09-27 10:46 UTC by Dan Mace
Modified: 2020-05-13 21:25 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1756303 (view as bug list)
Last Closed: 2020-05-13 21:25:37 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 304 0 None closed Bug 1756302: dns/aws: Do not include record in error messages 2021-02-15 07:24:56 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-05-13 21:25:39 UTC

Description Dan Mace 2019-09-27 10:46:37 UTC
Description of problem:

From Trevor King:

UX bug: when the configured private zone cannot be found (e.g. because you try to identify it with the wrong kubernetes.io/cluster/{id} tag, https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5196/rehearse-5196-pull-ci-openshift-installer-master-e2e-aws-upi/3/artifacts/e2e-aws-upi/must-gather/cluster-scoped-resources/config.openshift.io/dnses.yaml ), currently the ingress operator seems to go into some sort of error-accumulating death spiral, ending up with https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5196/rehearse-5196-pull-ci-openshift-installer-master-e2e-aws-upi/3/artifacts/e2e-aws-upi/must-gather/namespaces/openshift-ingress-operator/pods/ingress-operator-595cbc88bd-rg8g4/ingress-operator/ingress-operator/logs/current.log which has a few gigantic lines:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5196/rehearse-5196-pull-ci-openshift-installer-master-e2e-aws-upi/3/artifacts/e2e-aws-upi/must-gather/namespaces/openshift-ingress-operator/pods/ingress-operator-595cbc88bd-rg8g4/ingress-operator/ingress-operator/logs/current.log | wc
     36 3216009 50502408

and ends with ... "error": "etcdserver: request is too large"}.  We probably want to truncate that error accumulation, and maybe only return the most recent error for a given DNS entry (currently the last error message that still fit into etcd seems to be reflected in https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5196/rehearse-5196-pull-ci-openshift-installer-master-e2e-aws-upi/3/artifacts/e2e-aws-upi/must-gather/namespaces/openshift-ingress-operator/ingress.operator.openshift.io/dnsrecords/default-wildcard.yaml ).

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Comment 3 Hongan Li 2019-10-11 10:05:02 UTC
verified with 4.3.0-0.ci-2019-10-10-225759 and issue has been fixed.

1. `oc edit dns` and change privateZone's tag
2. delete ingress operator pod to force it restart
3. `oc get dnsrecord -o yaml -n openshift-ingress-operator`
    - conditions:
      - lastTransitionTime: "2019-10-11T09:57:07Z"
        message: 'The DNS provider failed to ensure the record: failed to find hosted
          zone for record: no matching hosted zone found'
        reason: ProviderError
        status: "True"
        type: Failed

Comment 5 errata-xmlrpc 2020-05-13 21:25:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.