Bug 1873311 - e2e test fails NetworkPolicy between server and client should stop enforcing policies after they are deleted [Feature:NetworkPolicy]
Summary: e2e test fails NetworkPolicy between server and client should stop enforcing ...
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Tim Rozet
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-27 19:56 UTC by Tim Rozet
Modified: 2020-09-08 20:43 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 243 None closed Bug 1873311: 8 27 2020 merge 2020-09-11 21:46:46 UTC

Description Tim Rozet 2020-08-27 19:56:20 UTC
Description of problem:
Test is failing constantly in upstream and downstream CI.

Comment 1 Tim Rozet 2020-08-27 20:01:09 UTC
We debugged and found that the issue is introduced by:

https://github.com/ovn-org/ovn-kubernetes/commit/fd4758701cd61e0f69e21ef5a96ab5d91f704ef0

This makes this test fail when you deploy with more than one node, and the client and server are on different nodes. Consider the following:

client-----nodeA----nodeB---Server

An ingress deny all policy is placed on the cluster. client cannot communicate with server. This works fine.

A new policy is added to allow ingress into Server from client. Sending traffic from client -> Server, arrives at server. However, return traffic from Server-> Client is dropped at nodeA. This is because when we create network policy it only targets port groups:


[root@ovn-control-plane ~]# ovn-nbctl acl-list f62f4d42-5cfb-45e5-8f7f-4bf3e7c9fbe6
  to-lport  1001 (ip4.src == {$a12672671609520104948} && outport == @a1383251650920656097) allow-related

This port group only includes the destination, which is the server. In this case an allow-related ACL will only be placed on nodeB. Therefore any return traffic in nodeA is not conntracked and therefore will be dropped because there is no way to tell it is return traffic.

Comment 2 Tim Rozet 2020-08-27 20:01:32 UTC
adding links to sippy for ovn

[sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client should stop enforcing policies after they are deleted [Feature:NetworkPolicy]

Comment 3 Tim Rozet 2020-08-27 20:03:53 UTC
As a short term solution we are reverting the previous commit. This will temporarily lower performance for clusters without network policy. The correct fix should be to add ACLs for portgroups on the client side as Dumitru mentioned:

"So it should be ok to add a PG, pg_client and an acl (also applied on pg_client) with match inport == @pg_client && ip.dst == <server_ip> action allow-related"

I'll open another bug to address allow-related perf fix.


Note You need to log in before you can comment on or make changes to this bug.