1828618 – Improve Polling Loops for Ingress/DNS Operator e2e

Bug 1828618 - Improve Polling Loops for Ingress/DNS Operator e2e

Summary: Improve Polling Loops for Ingress/DNS Operator e2e

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Stephen Greene
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-27 22:40 UTC by Daneyon Hansen
Modified:	2022-08-04 22:27 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: e2e operator polling loops return errors from cluster API calls without retrying Consequence: e2e operator tests are more likely to fail on unrelated API and or networking issues Fix: Have the operator polling loops return no errors when transient API or networking errors are detected, and instead print a log message and retry. Result: e2e operator tests are more likely to pass
Clone Of:
Environment:
Last Closed:	2020-10-27 15:58:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-dns-operator pull 181	None	closed	Bug 1828618: Improve e2e test polling loops	2020-10-27 18:18:10 UTC
Github	openshift cluster-ingress-operator pull 415	None	closed	Bug 1828618: Improve e2e test polling loops	2020-10-27 18:18:24 UTC
Red Hat Product Errata	RHBA-2020:4196	None	None	None	2020-10-27 15:58:46 UTC

Description Daneyon Hansen 2020-04-27 22:40:07 UTC

Description of problem:
Polling loops return `return false, nil` when `return false, err` should be returned. See [1] and [2] for additional details.

Version-Release number of selected component (if applicable):
4.x

How reproducible:
N/A

Steps to Reproduce:
1. Run e2e test for ingress or dns operator.

Actual results:
Success

Expected results:
Success

Additional info:
[1] https://github.com/openshift/cluster-ingress-operator/blob/master/test/e2e/operator_test.go
[2] https://github.com/openshift/cluster-dns-operator/pull/168/files#r413454598

Comment 1 Andrew McDermott 2020-05-19 15:20:17 UTC

Moving to 4.6.

Comment 2 Daneyon Hansen 2020-06-16 16:56:59 UTC

I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 3 Stephen Greene 2020-06-18 20:28:25 UTC

@Miciah and I discussed this recently (github thread https://github.com/openshift/cluster-ingress-operator/pull/400#discussion_r442253985) and both came to agree that in general both the DNS and Ingress operator e2e tests should avoid hard failing on transient API or networking issues most likely caused by other cluster components. This would make it easier to actually determine which cluster component is at fault during an e2e run. 

That being said, it would probably make the most sense for both of the operator's e2e polling loops to simply log API errors, should they come up, and continue returning `false, nil`, rather than hard failing by returning `false, err`. 

@dhansen, does this sound reasonable to you?

I've marked my current PR's for this bugzilla as [WIP], since we are now re-thinking how we want to do this.

Comment 4 Daneyon Hansen 2020-06-19 17:13:36 UTC

> @dhansen, does this sound reasonable to you?

Yes

Comment 5 Andrew McDermott 2020-07-09 12:16:52 UTC

I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 6 Stephen Greene 2020-07-28 17:22:04 UTC

Adding upcoming sprint since this BZ's fix will be reviewed in the coming weeks.

Comment 7 Daneyon Hansen 2020-08-13 16:37:28 UTC

openshift cluster-ingress-operator pull 415 needs to get rebased and openshift cluster-dns-operator pull 181 needs a /lgtm.

Comment 9 Andrew McDermott 2020-09-10 12:00:26 UTC

I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 12 Hongan Li 2020-09-24 02:56:35 UTC

run e2e test for ingress or dns operator and passed

Comment 15 errata-xmlrpc 2020-10-27 15:58:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.