Bug 1923231
Summary: | [sig-network] Conntrack should be able to preserve UDP traffic when server pod cycles for a NodePort service | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Antonio Ojea <aojeagar> | |
Component: | Networking | Assignee: | Antonio Ojea <aojeagar> | |
Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | high | CC: | aconstan, ccoleman, jluhrsen | |
Version: | 4.7 | |||
Target Milestone: | --- | |||
Target Release: | 4.7.z | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1949063 (view as bug list) | Environment: |
[sig-network] Conntrack should be able to preserve UDP traffic when server pod cycles for a NodePort service
|
|
Last Closed: | 2021-12-01 13:35:22 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1949063 | |||
Bug Blocks: |
Description
Antonio Ojea
2021-02-01 15:20:49 UTC
This fails roughly 1/150 times in our new network stress test (which this is the only remaining known unfixed flake in over 300 runs of each test). Once we have this fixed, we can use network-stress as a "flake introduction PR blocker" - the occurence of a new flake on this test suite in a PR would block the merge, which would potentially help us tighten regressions introduced by new versions of the OS, network plugins, etc. Temptative fix https://github.com/kubernetes/kubernetes/pull/98305 I still need to run it with the reproducer to confirm it fixes the problem Fix on https://github.com/kubernetes/kubernetes/pull/98305 I run the test 150 times without any failure (In reply to Antonio Ojea from comment #4) > Fix on https://github.com/kubernetes/kubernetes/pull/98305 > > I run the test 150 times without any failure This is still failing: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-upgrade-4.6-stable-to-4.7-ci/1364177777766436864 looks like it's happening sporadically across some different jobs: https://search.ci.openshift.org/?search=Conntrack+should+be+able+to+preserve+UDP+traffic+when+server+pod+cycles+for+a+NodePort+service&maxAge=48h&context=1&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job (In reply to jamo luhrsen from comment #5) > (In reply to Antonio Ojea from comment #4) > > Fix on https://github.com/kubernetes/kubernetes/pull/98305 > > > > I run the test 150 times without any failure > > This is still failing: > https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift- > origin-installer-e2e-aws-ovn-upgrade-4.6-stable-to-4.7-ci/1364177777766436864 > > looks like it's happening sporadically across some different jobs: > https://search.ci.openshift.org/ > ?search=Conntrack+should+be+able+to+preserve+UDP+traffic+when+server+pod+cycl > es+for+a+NodePort+service&maxAge=48h&context=1&type=junit&name=&maxMatches=5& > maxBytes=20971520&groupBy=job the patch it is not in openshift, was merged only in Kubernetes, it needs to be backported (In reply to Antonio Ojea from comment #6) > (In reply to jamo luhrsen from comment #5) > > (In reply to Antonio Ojea from comment #4) > > > Fix on https://github.com/kubernetes/kubernetes/pull/98305 > > > > > > I run the test 150 times without any failure > > > > This is still failing: > > https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift- > > origin-installer-e2e-aws-ovn-upgrade-4.6-stable-to-4.7-ci/1364177777766436864 > > > > looks like it's happening sporadically across some different jobs: > > https://search.ci.openshift.org/ > > ?search=Conntrack+should+be+able+to+preserve+UDP+traffic+when+server+pod+cycl > > es+for+a+NodePort+service&maxAge=48h&context=1&type=junit&name=&maxMatches=5& > > maxBytes=20971520&groupBy=job > > the patch it is not in openshift, was merged only in Kubernetes, it needs to > be backported got it. who takes care of this, or how can we track it? I keep running across this failure in our downstream CI so would be nice to get it resolved. backport upstream merged in 1.20 branch https://github.com/kubernetes/kubernetes/pull/99017/commits downstream backport to openshift https://github.com/openshift/kubernetes/pull/602 the fix merged in https://github.com/openshift/sdn/pull/286 following up that I've looked at the testgrid of some jobs and this test case does flake every once in a while (fails first try and passes second try), but mostly it's all passing. That seems good enough reason to close this bug as Verified. some testgrid links: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-ovn-upgrade https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#periodic-ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-aws-ovn-upgrade (In reply to jamo luhrsen from comment #12) > following up that I've looked at the testgrid of some jobs and this test > case does flake every once in a while (fails first try > and passes second try), but mostly it's all passing. That seems good enough > reason to close this bug as Verified. > > some testgrid links: > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic- > ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn- > upgrade > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic- > ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-ovn- > upgrade > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic- > ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn- > upgrade > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#periodic- > ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-aws-ovn- > upgrade this bug is about kube-proxy - openshift-sdn, not OVN, that uses a different logic (In reply to Antonio Ojea from comment #13) > (In reply to jamo luhrsen from comment #12) > > following up that I've looked at the testgrid of some jobs and this test > > case does flake every once in a while (fails first try > > and passes second try), but mostly it's all passing. That seems good enough > > reason to close this bug as Verified. > > > > some testgrid links: > > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic- > > ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn- > > upgrade > > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic- > > ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-ovn- > > upgrade > > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic- > > ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn- > > upgrade > > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#periodic- > > ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-aws-ovn- > > upgrade > > this bug is about kube-proxy - openshift-sdn, not OVN, that uses a different > logic ok, but that's not what jobs I was commenting on in comment #5. Looks like things got better some other way. do we need to change this back from Verified? (In reply to jamo luhrsen from comment #14) > (In reply to Antonio Ojea from comment #13) > > (In reply to jamo luhrsen from comment #12) > > > following up that I've looked at the testgrid of some jobs and this test > > > case does flake every once in a while (fails first try > > > and passes second try), but mostly it's all passing. That seems good enough > > > reason to close this bug as Verified. > > > > > > some testgrid links: > > > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic- > > > ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn- > > > upgrade > > > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic- > > > ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-ovn- > > > upgrade > > > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic- > > > ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn- > > > upgrade > > > https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#periodic- > > > ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-aws-ovn- > > > upgrade > > > > this bug is about kube-proxy - openshift-sdn, not OVN, that uses a different > > logic > > ok, but that's not what jobs I was commenting on in comment #5. Looks like > things got better some other way. > > do we need to change this back from Verified? I just wanted to clarify that my PRs are unrelated to OVN , this bug was fixed upstream (and downstream AFAIK) I suggest to close this bug and open a new one for OVN if you want to track it, but is really up to you on how to handle it Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.38 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4802 |