Bug 1997235 - test "should drop INVALID conntrack entries" failing after k8s bump to 1.21
Summary: test "should drop INVALID conntrack entries" failing after k8s bump to 1.21
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Riccardo Ravaioli
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 1945329
TreeView+ depends on / blocked
 
Reported: 2021-08-24 16:57 UTC by jamo luhrsen
Modified: 2022-08-26 14:29 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-26 14:29:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description jamo luhrsen 2021-08-24 16:57:34 UTC
This test currently fails after the bump to 1.21 and was disabled here:
  https://github.com/openshift/kubernetes/commit/85919fe4230999e8c1372ec67c5340d94e3f8b3d

A PR to enable the test is here:
  https://github.com/openshift/kubernetes/pull/897

But the test is still failing as we can see in the presubmit job here:
  https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_kubernetes/897/pull-ci-openshift-kubernetes-master-k8s-e2e-gcp/1428818540730781696

Comment 1 jamo luhrsen 2021-08-31 20:37:18 UTC
@rravaiol, wondering if this bz has a chance of finding resolution before friday when 4.9 target release bugs
should be done. Asking because I have the bz [0] to re-enable the test and it's marking the target release as 4.9. I will
remove that if there is no chance this will be resolved in time.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1945329

Comment 2 Riccardo Ravaioli 2021-09-27 17:47:20 UTC
**** The test works fine in upstream kubernetes. For instance, when running it in a KIND cluster with a 1.21.1 kubernetes image, it succeeds:

$ _output/dockerized/bin/linux/amd64/e2e.test -kubeconfig $HOME/admin.conf -ginkgo.focus=".*conntrack entries.*" -num-nodes 2
[...]
Sep 27 15:25:38.104: INFO: boom-server OK: did not receive any RST packet
[AfterEach] [sig-network] Conntrack
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:186
Sep 27 15:25:38.104: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "conntrack-3516" for this suite.

• [SLOW TEST:72.101 seconds]
[sig-network] Conntrack
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/network/common/framework.go:23
  should drop INVALID conntrack entries
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/network/conntrack.go:288
------------------------------
{"msg":"PASSED [sig-network] Conntrack should drop INVALID conntrack entries","total":1,"completed":1,"skipped":6528,"failed":0}



**** However, the test fails in openshift because starting from version 4.6 we dropped NET_RAW capability as default capabilities for containers (https://docs.openshift.com/container-platform/4.6/release_notes/ocp-4-6-release-notes.html#ocp-4-6-known-issues). When running the test in openshift (in this case, 4.9), the logs of the boom-server pod are the following:

$ oc logs boom-server -n conntrack-5646 -f
2021/09/27 16:33:45 external ip: 10.128.6.28
2021/09/27 16:33:45 listen on 0.0.0.0:9000
2021/09/27 16:33:45 probing 10.128.6.28
panic: listen ip:tcp 10.128.6.28: socket: operation not permitted
goroutine 18 [running]:
main.probe(0xc0000a4290, 0xb)
	/go/src/k8s.io/kubernetes/test/images/regression-issue-74839/main.go:75 +0x996
created by main.main
	/go/src/k8s.io/kubernetes/test/images/regression-issue-74839/main.go:40 +0x15d



**** The boom-server image used in this test forges out-of-order TCP packets and injects them into the network. This requires the container to have the CAP_NET_RAW linux capability, otherwise the test will fail.


I just posted a PR fixing this in upstream kubernetes: https://github.com/kubernetes/kubernetes/pull/105283

Comment 3 Tim Rozet 2022-01-12 22:13:39 UTC
@

Comment 4 Tim Rozet 2022-01-12 22:14:35 UTC
It looks like upstream merged. Is this now in origin downstream? Can we move to modified?

Comment 5 Riccardo Ravaioli 2022-01-19 12:04:16 UTC
Yes, the commit is now in origin downstream, moving the BZ status to MODIFIED.
https://github.com/openshift/kubernetes/commit/d97a1b8d630

Comment 6 jamo luhrsen 2022-02-03 18:50:13 UTC
(In reply to Riccardo Ravaioli from comment #5)
> Yes, the commit is now in origin downstream, moving the BZ status to
> MODIFIED.
> https://github.com/openshift/kubernetes/commit/d97a1b8d630

@rravaiol, this test is still failing even with this commit. Maybe I'm missing something, but I have a PR
here [0] to re-enable the test and it still fails in the job [1] on that PR. At first I was not sure the tests were
being run with your fix because the k8s-e2e test version was still reporting v1.22 (can see in the build log of [1]),
but I think they just haven't tagged the openshift/kubernetes repo with v1.23 yet. I added some debug code in
my PR to re-enable the test to verify it really is using the code with your fix. It's still failing however.

Any ideas on this?

[0] https://github.com/openshift/kubernetes/pull/897/commits/26ccf2144702583a9b87aa818a4cdef692076c58
[1] https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_kubernetes/897/pull-ci-openshift-kubernetes-master-k8s-e2e-gcp/1488921415565447168


Note You need to log in before you can comment on or make changes to this bug.