Bug 1873311 - e2e test fails NetworkPolicy between server and client should stop enforcing policies after they are deleted [Feature:NetworkPolicy]
Summary: e2e test fails NetworkPolicy between server and client should stop enforcing ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Tim Rozet
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-27 19:56 UTC by Tim Rozet
Modified: 2020-10-27 16:35 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:35:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 243 0 None closed Bug 1873311: 8 27 2020 merge 2021-01-27 18:20:34 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:35:46 UTC

Description Tim Rozet 2020-08-27 19:56:20 UTC
Description of problem:
Test is failing constantly in upstream and downstream CI.

Comment 1 Tim Rozet 2020-08-27 20:01:09 UTC
We debugged and found that the issue is introduced by:

https://github.com/ovn-org/ovn-kubernetes/commit/fd4758701cd61e0f69e21ef5a96ab5d91f704ef0

This makes this test fail when you deploy with more than one node, and the client and server are on different nodes. Consider the following:

client-----nodeA----nodeB---Server

An ingress deny all policy is placed on the cluster. client cannot communicate with server. This works fine.

A new policy is added to allow ingress into Server from client. Sending traffic from client -> Server, arrives at server. However, return traffic from Server-> Client is dropped at nodeA. This is because when we create network policy it only targets port groups:


[root@ovn-control-plane ~]# ovn-nbctl acl-list f62f4d42-5cfb-45e5-8f7f-4bf3e7c9fbe6
  to-lport  1001 (ip4.src == {$a12672671609520104948} && outport == @a1383251650920656097) allow-related

This port group only includes the destination, which is the server. In this case an allow-related ACL will only be placed on nodeB. Therefore any return traffic in nodeA is not conntracked and therefore will be dropped because there is no way to tell it is return traffic.

Comment 2 Tim Rozet 2020-08-27 20:01:32 UTC
adding links to sippy for ovn

[sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client should stop enforcing policies after they are deleted [Feature:NetworkPolicy]

Comment 3 Tim Rozet 2020-08-27 20:03:53 UTC
As a short term solution we are reverting the previous commit. This will temporarily lower performance for clusters without network policy. The correct fix should be to add ACLs for portgroups on the client side as Dumitru mentioned:

"So it should be ok to add a PG, pg_client and an acl (also applied on pg_client) with match inport == @pg_client && ip.dst == <server_ip> action allow-related"

I'll open another bug to address allow-related perf fix.

Comment 8 errata-xmlrpc 2020-10-27 16:35:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.