Bug 1997235
Summary: | test "should drop INVALID conntrack entries" failing after k8s bump to 1.21 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | jamo luhrsen <jluhrsen> |
Component: | Networking | Assignee: | Riccardo Ravaioli <rravaiol> |
Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | astoycos, rravaiol, trozet, vlaad |
Version: | 4.9 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-26 14:29:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1945329 |
Description
jamo luhrsen
2021-08-24 16:57:34 UTC
@rravaiol, wondering if this bz has a chance of finding resolution before friday when 4.9 target release bugs should be done. Asking because I have the bz [0] to re-enable the test and it's marking the target release as 4.9. I will remove that if there is no chance this will be resolved in time. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1945329 **** The test works fine in upstream kubernetes. For instance, when running it in a KIND cluster with a 1.21.1 kubernetes image, it succeeds: $ _output/dockerized/bin/linux/amd64/e2e.test -kubeconfig $HOME/admin.conf -ginkgo.focus=".*conntrack entries.*" -num-nodes 2 [...] Sep 27 15:25:38.104: INFO: boom-server OK: did not receive any RST packet [AfterEach] [sig-network] Conntrack /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:186 Sep 27 15:25:38.104: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready STEP: Destroying namespace "conntrack-3516" for this suite. • [SLOW TEST:72.101 seconds] [sig-network] Conntrack /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/network/common/framework.go:23 should drop INVALID conntrack entries /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/network/conntrack.go:288 ------------------------------ {"msg":"PASSED [sig-network] Conntrack should drop INVALID conntrack entries","total":1,"completed":1,"skipped":6528,"failed":0} **** However, the test fails in openshift because starting from version 4.6 we dropped NET_RAW capability as default capabilities for containers (https://docs.openshift.com/container-platform/4.6/release_notes/ocp-4-6-release-notes.html#ocp-4-6-known-issues). When running the test in openshift (in this case, 4.9), the logs of the boom-server pod are the following: $ oc logs boom-server -n conntrack-5646 -f 2021/09/27 16:33:45 external ip: 10.128.6.28 2021/09/27 16:33:45 listen on 0.0.0.0:9000 2021/09/27 16:33:45 probing 10.128.6.28 panic: listen ip:tcp 10.128.6.28: socket: operation not permitted goroutine 18 [running]: main.probe(0xc0000a4290, 0xb) /go/src/k8s.io/kubernetes/test/images/regression-issue-74839/main.go:75 +0x996 created by main.main /go/src/k8s.io/kubernetes/test/images/regression-issue-74839/main.go:40 +0x15d **** The boom-server image used in this test forges out-of-order TCP packets and injects them into the network. This requires the container to have the CAP_NET_RAW linux capability, otherwise the test will fail. I just posted a PR fixing this in upstream kubernetes: https://github.com/kubernetes/kubernetes/pull/105283 @ It looks like upstream merged. Is this now in origin downstream? Can we move to modified? Yes, the commit is now in origin downstream, moving the BZ status to MODIFIED. https://github.com/openshift/kubernetes/commit/d97a1b8d630 (In reply to Riccardo Ravaioli from comment #5) > Yes, the commit is now in origin downstream, moving the BZ status to > MODIFIED. > https://github.com/openshift/kubernetes/commit/d97a1b8d630 @rravaiol, this test is still failing even with this commit. Maybe I'm missing something, but I have a PR here [0] to re-enable the test and it still fails in the job [1] on that PR. At first I was not sure the tests were being run with your fix because the k8s-e2e test version was still reporting v1.22 (can see in the build log of [1]), but I think they just haven't tagged the openshift/kubernetes repo with v1.23 yet. I added some debug code in my PR to re-enable the test to verify it really is using the code with your fix. It's still failing however. Any ideas on this? [0] https://github.com/openshift/kubernetes/pull/897/commits/26ccf2144702583a9b87aa818a4cdef692076c58 [1] https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_kubernetes/897/pull-ci-openshift-kubernetes-master-k8s-e2e-gcp/1488921415565447168 This is now passing in CI. This PR [0] will re-enable the test, but you can see the job [1] running on that PR is passing this test case now. You can also see that the e2e test version is also v1.23 [2] [0] https://github.com/openshift/kubernetes/pull/897 [1] https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_kubernetes/897/pull-ci-openshift-kubernetes-master-k8s-e2e-gcp/1501689695107551232 [2] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_kubernetes/897/pull-ci-openshift-kubernetes-master-k8s-e2e-gcp/1501689695107551232/artifacts/k8s-e2e-gcp/openshift-kubernetes-e2e-test/build-log.txt |