Hide Forgot
It turns out that even though there were a bunch of unidling tests in origin/test/extended/idling/, none of them ever got run in CI, and so ovn-kubernetes had never been tested against any of them. There is a single unidling test outside of that directory, "The HAProxy router should be able to connect to a service that is idled because a GET on the route will unidle it", and ovn-kubernetes does reliably pass that one. Also, the UDP unidling test passes; it's only the TCP ones that fail. It's _possible_ that the problem is a badly-written test, but it works fine under openshift-sdn... Newly-added and failing on ovn-kubernetes: - [sig-network-edge][Feature:Idling] Unidling should work with TCP (when fully idled) - [sig-network-edge][Feature:Idling] Unidling should work with TCP (while idling) Newly-added and passing on ovn-kubernetes: - [sig-network-edge][Feature:Idling] Idling with a single service and ReplicationController should idle the service and ReplicationController properly - [sig-network-edge][Feature:Idling] Unidling should work with UDP Newly-added but [Serial] so it doesn't run in e2e-aws-ovn / e2e-gcp-ovn and OMG do we still have no e2e-*-ovn-serial job anywhere? - [sig-network-edge][Feature:Idling] Unidling should handle many TCP connections by possibly dropping those over a certain bound [Serial] - [sig-network-edge][Feature:Idling] Unidling should handle many UDP senders (by continuing to drop all packets on the floor) [Serial]
oh, these tests are currently still disabled everywhere, but will be re-enabled by https://github.com/openshift/origin/pull/26155
@danw, wanted to know if you had a specific plan for this bz? It's assigned to me, but not sure if I'm the one to un-flake these tests (assuming it's ovn-ish things that cause the flakes)? I see your PR to re-enable the tests is still in progress, but some tests did fail in the most recent job: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/26155/pull-ci-openshift-origin-master-e2e-gcp/1430255548804108288 Can you let me know how I can help here?
yeah, I was assuming that PR was going to merge sooner... the tests failed in the latest e2e-gcp run because openshift/kubernetes#899 hasn't merged, so "oc idle" doesn't work. so given that, there's no easy way to test it under ovn-kube for now...
ok, so plan is to wait for openshift/kubernetes#899, then /retest origin#26155 and see where we are?
yes
ok, idling is fixed, the tests are merged (in 4.9), and they're skipped on ovn-kube
you can see the tests re-introduced after https://github.com/openshift/origin/pull/26155 was merged and they are passing. here is a testgrid link to show it. marking this as verified: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-ci-4.9-e2e-aws-ovn&show-stale-tests=
sorry, I should have been clearer. 6 new tests were added. 2 are enabled everywhere, 2 are [Serial] and so don't get run on any of our current ovn-kube jobs, and 2 run under openshift-sdn but are disabled on ovn-kubernetes. You can confirm that neither of these show up in the test grid: - [sig-network-edge][Feature:Idling] Unidling should work with TCP (when fully idled) - [sig-network-edge][Feature:Idling] Unidling should work with TCP (while idling) because 26155 marks them as skipped on ovn-kube, because they tend to not work there.
(In reply to Dan Winship from comment #8) > You can confirm that neither of these show up in the test grid: > > - [sig-network-edge][Feature:Idling] Unidling should work with TCP (when > fully idled) > - [sig-network-edge][Feature:Idling] Unidling should work with TCP (while > idling) correct, these two tests are not being run at this point. change back to VERIFIED?
No, *that's the bug*. This bz is tracking the fact that we had to disable two tests on ovn-kubernetes because there is a bug in ovn-kubernetes that causes the tests to fail. It can be closed after the tests are re-enabled (which can only happen after the ovn-kubernetes bug is fixed).
ok. clearly I was clueless to what's going on. Is there a different bug to track the fix that we need in ovn-kubernetes? Or is it this one? If this bz is just to re-enable the tests once we get the fix, I will keep it. but if this bz is to track the actual fix then I think we need to re-assign to someone else.
There's currently only the one bug for both parts. It might make sense to clone it to have a second bug for fixing ovn-kube.
ok, here is https://bugzilla.redhat.com/show_bug.cgi?id=2003228 to track the dev work to get the tests passing. There is a PR https://github.com/openshift/origin/pull/26460 to re-enable the tests once we have a fix.
I don't think these two "Unidling should work with TCP" tests are flaky anymore. At least they are not failing in my PR that adds them back: https://github.com/openshift/origin/pull/26460 If that's correct, we can close this bz I guess.
(In reply to jamo luhrsen from comment #14) > I don't think these two "Unidling should work with TCP" tests are flaky > anymore. > At least they are not failing in my PR that adds them back: > https://github.com/openshift/origin/pull/26460 > > If that's correct, we can close this bz I guess. Incorrect. I was only running openshift-sdn jobs where those tests do pass. the ovn-k8s version of the job does flake. Need https://bugzilla.redhat.com/show_bug.cgi?id=2003228 resolved before this can be.
*** Bug 2017036 has been marked as a duplicate of this bug. ***
still no progress on https://bugzilla.redhat.com/show_bug.cgi?id=2003228 so moving this target release to 4.11
@mmahmoud, I was pinged about the status of this bug. It's blocked on https://bugzilla.redhat.com/show_bug.cgi?id=2003228 which is assigned to you. They wanted a fresh comment here in this bz to say that, since this came up as a stale bug. just fyi.
The unidling behavior has been fixed in ovnk via the retry mechanisms introduced in 4.11. The tests were re-enabled in https://bugzilla.redhat.com/show_bug.cgi?id=2003228, https://github.com/openshift/origin/pull/27538 ....Closing this bug.