Bug 2004076 - fix flaky test "Unidling should work with TCP (while idling)" on openshift-sdn
Summary: fix flaky test "Unidling should work with TCP (while idling)" on openshift-sdn
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Mohamed Mahmoud
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 2004074
TreeView+ depends on / blocked
 
Reported: 2021-09-14 13:17 UTC by Dan Winship
Modified: 2022-11-17 22:40 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2004074
Environment:
Last Closed: 2022-11-17 22:40:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Dan Winship 2021-09-14 13:17:40 UTC
+++ This bug was initially created as a clone of Bug #2004074 +++

the test

"[sig-network-edge][Feature:Idling] Unidling should work with TCP (while idling) [Skipped:Network/OVNKubernetes] [Suite:openshift/conformance/parallel]"

is currently very flaky. This is a test that was only recently un-disabled after having been disabled for all of 4.x, so this does not indicate a recent regression.

(Note that this test is also very flaky under ovn-kubernetes, however in that case the "(when fully idled)" version of the test is also flaky, whereas with openshift-sdn only the "(while idling)" version flakes.)

Comment 1 Dan Winship 2021-09-14 13:18:21 UTC
(This bug tracks fixing the underlying idling issue in openshift-sdn. Bug 2004074 tracks un-skipping the test once this bug is fixed.)

Comment 3 Scott Dodson 2022-05-17 15:58:12 UTC
Adjusting this to have Version 4.9 as this test flakes or fails at a high rate there as well. I will pursue moving that to be a broken test.

Comment 4 Dan Winship 2022-05-17 16:46:45 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=2085327#c3 although that comment was written before I realized we'd already disabled this test in 4.10+.

The problem seems to be that idling the service takes longer than expected, so the "(when fully idled)" test works (because it waits for the service to idle before trying to unidle it) but the "(while idling)" test fails sometimes because it expects the service to have been successfully unidled before it actually gets idled in the first place. Right now it does:

  - idle the service
  - try to connect to the service every half a second for 10 seconds
  - fail if any of the connection attempts fail or the service still has the idle annotations

Instead it needs to do:

  - idle the service
  - try to connect to the service every half a second until N seconds after the idle annotation is removed from the service, up to a maximum of M seconds
  - fail if any of the connections attempts fail or the service was still not idled after M seconds

for some values of N and M, perhaps 5 and 60.


Note You need to log in before you can comment on or make changes to this bug.