Bug 1705100

Summary: [ci] e2e-aws-operator flakes
Product: OpenShift Container Platform Reporter: Dan Mace <dmace>
Component: NetworkingAssignee: Miciah Dashiel Butler Masters <mmasters>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: aos-bugs, bbennett
Version: 4.1.0   
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:48:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Mace 2019-05-01 13:31:05 UTC
Description of problem:

ingress-operator e2e test flake is happening more frequently, blocking PRs.

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/222/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/841


https://search.svc.ci.openshift.org/?search=TestIngressControllerUpdate&maxAge=168h&context=2&type=all


--- FAIL: TestIngressControllerUpdate (12.79s)
	operator_test.go:367: failed to reset IngressController: Operation cannot be fulfilled on ingresscontrollers.operator.openshift.io "default": the object has been modified; please apply your changes to the latest version and try again
	operator_test.go:381: failed to get recreated CA certificate configmap: timed out waiting for the condition

=== RUN   TestRouterCACertificate
--- FAIL: TestRouterCACertificate (11.60s)
	operator_test.go:595: failed to get CA certificate: timed out waiting for the condition


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Miciah Dashiel Butler Masters 2019-05-01 19:10:04 UTC
What I believe is happening is that the TestIngressControllerUpdate fails on its second update to the ingress controller because, in between the updates that the test does, something else (possibly the status sync code) is updating the ingress controller, which causes its resource version to change.

The solution is the re-get the resource before the second update.

TestRouterCACertificate fails because TestIngressControllerUpdate fails before restoring the original default certificate secret reference.  Fixing the failure in TestIngressControllerUpdate should prevent the failure in TestRouterCACertificate.

PR: https://github.com/openshift/cluster-ingress-operator/pull/223

Comment 3 Miciah Dashiel Butler Masters 2019-05-02 16:11:19 UTC
We had a similar problem in TestIngressControllerScale, which should also be fixed now.

PR: https://github.com/openshift/cluster-ingress-operator/pull/225

Comment 4 Hongan Li 2019-05-08 06:36:05 UTC
I'm going to mark this as verified since no similar issue in recent ci test.

Comment 6 errata-xmlrpc 2019-06-04 10:48:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758