1756303 – Failed DNS updates causes recursive data accumulation leading to update failure

Bug 1756303 - Failed DNS updates causes recursive data accumulation leading to update failure

Summary: Failed DNS updates causes recursive data accumulation leading to update failure

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.2.z
Assignee:	Dan Mace
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:	1756302
Blocks:
TreeView+	depends on / blocked

Reported:	2019-09-27 10:48 UTC by Dan Mace
Modified:	2022-08-04 22:24 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1756302
Environment:
Last Closed:	2019-10-30 04:44:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 305	0	None	closed	Bug 1756303: dns/aws: Do not include record in error messages	2020-11-22 17:43:31 UTC
Red Hat Product Errata	RHBA-2019:3151	0	None	None	None	2019-10-30 04:45:07 UTC

Description Dan Mace 2019-09-27 10:48:23 UTC

+++ This bug was initially created as a clone of Bug #1756302 +++

Description of problem:

From Trevor King:

UX bug: when the configured private zone cannot be found (e.g. because you try to identify it with the wrong kubernetes.io/cluster/{id} tag, https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5196/rehearse-5196-pull-ci-openshift-installer-master-e2e-aws-upi/3/artifacts/e2e-aws-upi/must-gather/cluster-scoped-resources/config.openshift.io/dnses.yaml ), currently the ingress operator seems to go into some sort of error-accumulating death spiral, ending up with https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5196/rehearse-5196-pull-ci-openshift-installer-master-e2e-aws-upi/3/artifacts/e2e-aws-upi/must-gather/namespaces/openshift-ingress-operator/pods/ingress-operator-595cbc88bd-rg8g4/ingress-operator/ingress-operator/logs/current.log which has a few gigantic lines:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5196/rehearse-5196-pull-ci-openshift-installer-master-e2e-aws-upi/3/artifacts/e2e-aws-upi/must-gather/namespaces/openshift-ingress-operator/pods/ingress-operator-595cbc88bd-rg8g4/ingress-operator/ingress-operator/logs/current.log | wc
     36 3216009 50502408

and ends with ... "error": "etcdserver: request is too large"}.  We probably want to truncate that error accumulation, and maybe only return the most recent error for a given DNS entry (currently the last error message that still fit into etcd seems to be reflected in https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/5196/rehearse-5196-pull-ci-openshift-installer-master-e2e-aws-upi/3/artifacts/e2e-aws-upi/must-gather/namespaces/openshift-ingress-operator/ingress.operator.openshift.io/dnsrecords/default-wildcard.yaml ).


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Hongan Li 2019-10-23 06:52:30 UTC

verified with 4.2.0-0.nightly-2019-10-23-011659 and issue has been fixed.

$ oc get dnsrecords.ingress.operator.openshift.io -o yaml -n openshift-ingress-operator
<---snip--->
  status:
    zones:
    - conditions:
      - lastTransitionTime: "2019-10-23T06:49:59Z"
        message: 'The DNS provider failed to ensure the record: failed to find hosted
          zone for record: no matching hosted zone found'
        reason: ProviderError
        status: "True"
        type: Failed

Comment 4 errata-xmlrpc 2019-10-30 04:44:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3151

Note You need to log in before you can comment on or make changes to this bug.