Bug 1953729 - e2e unidling test is flaking heavily on SNO jobs
Summary: e2e unidling test is flaking heavily on SNO jobs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.8.0
Assignee: Stephen Greene
QA Contact: jechen
URL:
Whiteboard:
Depends On:
Blocks: 1955600
TreeView+ depends on / blocked
 
Reported: 2021-04-26 18:54 UTC by Stephen Greene
Modified: 2022-08-04 22:32 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:04:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26105 0 None open Bug 1953729: test/extended/router: Fix-up Unidling test 2021-04-26 19:06:46 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:04:20 UTC

Description Stephen Greene 2021-04-26 18:54:47 UTC
Description of problem:

The following test is flaking heavily in Single Node OpenShift CI

The HAProxy router should be able to connect to a service that is idled because a GET on the route will unidle it

As found in https://github.com/openshift/origin/blob/master/test/extended/router/idle.go


See https://search.ci.openshift.org/?search=The+HAProxy+router+should+be+able+to+connect+to+a+service+that+is+idled+because+a+GET+on+the+route+will+unidle+it&maxAge=48h&context=1&type=bug%2Bjunit&name=%5Epull-ci-openshift-cluster-monitoring-operator-master-e2e-aws-single-node%24&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
Which shows a lot of flakes for the unidling test on SNO ci.


This test sometimes flakes on non SNO jobs. Taking a closer look at the test, you can see that the test does wait until the test workload is completely idled before trying to unidle it. You can also see that the test HTTP logic has a 15 minute HTTP timeout, which renders the HTTP retry logic useless.

This test was introduced in 4.6 so any test improvements and optimizations should be backported accordingly.

Comment 2 jechen 2021-04-30 18:05:53 UTC
Out of last 11 CI runs, there were flaky (most recent one was on April 29), one failed for this test

https://testgrid.k8s.io/redhat-single-node#periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-single-node

Will check again next week with more CI results.

Comment 3 jechen 2021-05-03 12:11:02 UTC
re-checked, last 4 days' CI passed on this test,mark verified.

https://testgrid.k8s.io/redhat-single-node#periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-single-node

Comment 6 errata-xmlrpc 2021-07-27 23:04:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.