Bug 1823460 - [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [Top Level] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client should enforce policy based on NamespaceSelector with MatchExpressions[Feature:Netw
Summary: [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and clie...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.5.0
Assignee: Tim Rozet
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks: 1825893
TreeView+ depends on / blocked
 
Reported: 2020-04-13 18:51 UTC by Ben Parees
Modified: 2020-07-13 17:27 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1825893 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:27:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 141 0 None closed Bug 1823460: 4-20-2020 merge 2020-06-23 10:00:49 UTC
Github ovn-org ovn-kubernetes pull 1244 0 None closed ovn: improve Namespace/NetworkPolicy locking 2020-06-23 10:00:49 UTC
Github ovn-org ovn-kubernetes pull 1262 0 None closed Fixes add network policy locking 2020-06-23 10:00:49 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:27:45 UTC

Description Ben Parees 2020-04-13 18:51:36 UTC
Description of problem:
[sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [Top Level] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client should enforce policy based on NamespaceSelector with MatchExpressions[Feature:NetworkPolicy] [Skipped:Network/OpenShiftSDN/Multitenant] [Suite:openshift/conformance/parallel]

seems to be the top failing networking test right now, failing at a rate of 50% across all ours jobs.

Example:

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-informing#release-openshift-ocp-installer-e2e-aws-ovn-4.4&sort-by-failures=&show-stale-tests=

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-ovn-4.4/1296


STEP: Saw pod success
Apr 13 11:59:21.455: INFO: Pod "client-a-4tqlm" satisfied condition "success or failure"
Apr 13 11:59:21.482: FAIL: Error getting container logs: the server could not find the requested resource (get pods client-a-4tqlm)

Full Stack Trace
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/network.checkNoConnectivity(0xc000c5c280, 0xc00082f760, 0xc0012b2400, 0xc000558900)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/network/network_policy.go:1457 +0x2a0
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/network.testCannotConnect(0xc000c5c280, 0xc00082f760, 0x558757b, 0x8, 0xc000558900, 0x50)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/network/network_policy.go:1406 +0x1fc
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/network.glob..func13.2.7()
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/network/network_policy.go:285 +0x883
github.com/openshift/origin/pkg/test/ginkgo.(*TestOptions).Run(0xc001600d80, 0xc001431390, 0x1, 0x1, 0x0, 0x0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/test/ginkgo/cmd_runtest.go:59 +0x41f
main.newRunTestCommand.func1(0xc000b64500, 0xc001431390, 0x1, 0x1, 0x0, 0x0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:233 +0x15d
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc000b64500, 0xc001431190, 0x1, 0x1, 0xc000b64500, 0xc001431190)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:826 +0x460
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc00080bb80, 0x0, 0x61efc60, 0x99c96b0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:914 +0x2fb
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:864
main.main.func1(0xc00080bb80, 0x0, 0x0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:57 +0x9c
main.main()
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:58 +0x341

Comment 1 Tim Rozet 2020-04-15 13:08:06 UTC
It looks to me like this is a race condition when the network policy and pod are being created at the same time. Both access a namespacesPolicies map, except network policy path is not locking before it accesses the map. Dan already has a patch for this, so we will see if that fixes it in downstream CI runs:

https://github.com/ovn-org/ovn-kubernetes/pull/1244

Comment 3 Ben Parees 2020-04-16 14:52:39 UTC
can we disable this test where it's not supported, and use this bug to re-enable it when the issue is fixed?

Comment 4 Tim Rozet 2020-04-16 21:05:50 UTC
https://github.com/ovn-org/ovn-kubernetes/pull/1244 fixes most of the locking and another bug with address set creation related to this test case.

I think we still need:
https://github.com/ovn-org/ovn-kubernetes/pull/1262

to address one more case of not locking correctly.

Comment 8 Tim Rozet 2020-04-20 18:43:11 UTC
(In reply to Ben Parees from comment #3)
> can we disable this test where it's not supported, and use this bug to
> re-enable it when the issue is fixed?

Ben, the fix is now present in 4.5 and 4.4. Thanks.

Comment 9 Ross Brattain 2020-04-22 02:12:30 UTC
It looks like the CI test is passing. I also manually tested some matchExpressions on 4.5.0-0.nightly-2020-04-21-103613

Verified.

Comment 10 errata-xmlrpc 2020-07-13 17:27:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.