Bug 1828618
Summary: | Improve Polling Loops for Ingress/DNS Operator e2e | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Daneyon Hansen <dhansen> |
Component: | Networking | Assignee: | Stephen Greene <sgreene> |
Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | unspecified | CC: | amcdermo, aos-bugs, sgreene |
Version: | 4.4 | ||
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
e2e operator polling loops return errors from cluster API calls without retrying
Consequence:
e2e operator tests are more likely to fail on unrelated API and or networking issues
Fix:
Have the operator polling loops return no errors when transient API or networking errors are detected, and instead print a log message and retry.
Result:
e2e operator tests are more likely to pass
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 15:58:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Daneyon Hansen
2020-04-27 22:40:07 UTC
Moving to 4.6. I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. @Miciah and I discussed this recently (github thread https://github.com/openshift/cluster-ingress-operator/pull/400#discussion_r442253985) and both came to agree that in general both the DNS and Ingress operator e2e tests should avoid hard failing on transient API or networking issues most likely caused by other cluster components. This would make it easier to actually determine which cluster component is at fault during an e2e run. That being said, it would probably make the most sense for both of the operator's e2e polling loops to simply log API errors, should they come up, and continue returning `false, nil`, rather than hard failing by returning `false, err`. @dhansen, does this sound reasonable to you? I've marked my current PR's for this bugzilla as [WIP], since we are now re-thinking how we want to do this. > @dhansen, does this sound reasonable to you?
Yes
I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. Adding upcoming sprint since this BZ's fix will be reviewed in the coming weeks. openshift cluster-ingress-operator pull 415 needs to get rebased and openshift cluster-dns-operator pull 181 needs a /lgtm. I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. run e2e test for ingress or dns operator and passed Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |