Description of problem: Test is failing constantly in upstream and downstream CI.
We debugged and found that the issue is introduced by: https://github.com/ovn-org/ovn-kubernetes/commit/fd4758701cd61e0f69e21ef5a96ab5d91f704ef0 This makes this test fail when you deploy with more than one node, and the client and server are on different nodes. Consider the following: client-----nodeA----nodeB---Server An ingress deny all policy is placed on the cluster. client cannot communicate with server. This works fine. A new policy is added to allow ingress into Server from client. Sending traffic from client -> Server, arrives at server. However, return traffic from Server-> Client is dropped at nodeA. This is because when we create network policy it only targets port groups: [root@ovn-control-plane ~]# ovn-nbctl acl-list f62f4d42-5cfb-45e5-8f7f-4bf3e7c9fbe6 to-lport 1001 (ip4.src == {$a12672671609520104948} && outport == @a1383251650920656097) allow-related This port group only includes the destination, which is the server. In this case an allow-related ACL will only be placed on nodeB. Therefore any return traffic in nodeA is not conntracked and therefore will be dropped because there is no way to tell it is return traffic.
adding links to sippy for ovn [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client should stop enforcing policies after they are deleted [Feature:NetworkPolicy]
As a short term solution we are reverting the previous commit. This will temporarily lower performance for clusters without network policy. The correct fix should be to add ACLs for portgroups on the client side as Dumitru mentioned: "So it should be ok to add a PG, pg_client and an acl (also applied on pg_client) with match inport == @pg_client && ip.dst == <server_ip> action allow-related" I'll open another bug to address allow-related perf fix.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196