Bug 1823460

Summary: [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [Top Level] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client should enforce policy based on NamespaceSelector with MatchExpressions[Feature:Netw
Product: OpenShift Container Platform Reporter: Ben Parees <bparees>
Component: NetworkingAssignee: Tim Rozet <trozet>
Networking sub component: ovn-kubernetes QA Contact: Ross Brattain <rbrattai>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: bbennett, danw, trozet
Version: 4.4   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1825893 (view as bug list) Environment:
Last Closed: 2020-07-13 17:27:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1825893    

Description Ben Parees 2020-04-13 18:51:36 UTC
Description of problem:
[sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [Top Level] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client should enforce policy based on NamespaceSelector with MatchExpressions[Feature:NetworkPolicy] [Skipped:Network/OpenShiftSDN/Multitenant] [Suite:openshift/conformance/parallel]

seems to be the top failing networking test right now, failing at a rate of 50% across all ours jobs.

Example:

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-informing#release-openshift-ocp-installer-e2e-aws-ovn-4.4&sort-by-failures=&show-stale-tests=

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-ovn-4.4/1296


STEP: Saw pod success
Apr 13 11:59:21.455: INFO: Pod "client-a-4tqlm" satisfied condition "success or failure"
Apr 13 11:59:21.482: FAIL: Error getting container logs: the server could not find the requested resource (get pods client-a-4tqlm)

Full Stack Trace
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/network.checkNoConnectivity(0xc000c5c280, 0xc00082f760, 0xc0012b2400, 0xc000558900)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/network/network_policy.go:1457 +0x2a0
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/network.testCannotConnect(0xc000c5c280, 0xc00082f760, 0x558757b, 0x8, 0xc000558900, 0x50)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/network/network_policy.go:1406 +0x1fc
github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/network.glob..func13.2.7()
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/network/network_policy.go:285 +0x883
github.com/openshift/origin/pkg/test/ginkgo.(*TestOptions).Run(0xc001600d80, 0xc001431390, 0x1, 0x1, 0x0, 0x0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/test/ginkgo/cmd_runtest.go:59 +0x41f
main.newRunTestCommand.func1(0xc000b64500, 0xc001431390, 0x1, 0x1, 0x0, 0x0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:233 +0x15d
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc000b64500, 0xc001431190, 0x1, 0x1, 0xc000b64500, 0xc001431190)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:826 +0x460
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc00080bb80, 0x0, 0x61efc60, 0x99c96b0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:914 +0x2fb
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:864
main.main.func1(0xc00080bb80, 0x0, 0x0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:57 +0x9c
main.main()
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:58 +0x341

Comment 1 Tim Rozet 2020-04-15 13:08:06 UTC
It looks to me like this is a race condition when the network policy and pod are being created at the same time. Both access a namespacesPolicies map, except network policy path is not locking before it accesses the map. Dan already has a patch for this, so we will see if that fixes it in downstream CI runs:

https://github.com/ovn-org/ovn-kubernetes/pull/1244

Comment 3 Ben Parees 2020-04-16 14:52:39 UTC
can we disable this test where it's not supported, and use this bug to re-enable it when the issue is fixed?

Comment 4 Tim Rozet 2020-04-16 21:05:50 UTC
https://github.com/ovn-org/ovn-kubernetes/pull/1244 fixes most of the locking and another bug with address set creation related to this test case.

I think we still need:
https://github.com/ovn-org/ovn-kubernetes/pull/1262

to address one more case of not locking correctly.

Comment 8 Tim Rozet 2020-04-20 18:43:11 UTC
(In reply to Ben Parees from comment #3)
> can we disable this test where it's not supported, and use this bug to
> re-enable it when the issue is fixed?

Ben, the fix is now present in 4.5 and 4.4. Thanks.

Comment 9 Ross Brattain 2020-04-22 02:12:30 UTC
It looks like the CI test is passing. I also manually tested some matchExpressions on 4.5.0-0.nightly-2020-04-21-103613

Verified.

Comment 10 errata-xmlrpc 2020-07-13 17:27:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409