Bug 2083239 - Drop ACL for EgressFirewall has priority higher than allow ACL despite being last in the chain
Summary: Drop ACL for EgressFirewall has priority higher than allow ACL despite being ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.9.z
Assignee: Andreas Karis
QA Contact: huirwang
URL:
Whiteboard:
Depends On: 2084344
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-09 14:30 UTC by Pablo Alonso Rodriguez
Modified: 2022-10-26 01:28 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-09 06:41:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 1088 0 None open [release-4.9] Bug 2083239: [CARRY][Downstream-only][4.9-only] Recreate efw ACL rules on update 2022-05-11 23:14:03 UTC
Red Hat Product Errata RHBA-2022:4906 0 None None None 2022-06-09 06:42:12 UTC

Description Pablo Alonso Rodriguez 2022-05-09 14:30:45 UTC
Description of problem:

When creating an EgressFirewall with a number of `Allow` rules and a last `Deny` catch-all rule, the drop ACL in the nbdb is created with a priority higher than some of the allow rules that are above it. Drop ACL should be the least priority one if it is the latest rule in the egress firewall.

The visible effect is that some allowed destinations are unreachable.

Re-creating the egress firewall can fix or trigger the issue to different endpoints at random. I didn't check the details before and after doing so, but given what I found in the nbdb when a single failure was happening, it is likely  to mean that the priority "distribution" is kind of random.

Version-Release number of selected component (if applicable):

4.9.29

How reproducible:

At concrete environment, always. Random behavior.

Steps to Reproduce:
1. Create or re-create EgressFirewall
2.
3.

Actual results:

ACLs with allow rules have lower priority than the drop rule from LATEST `Deny` rule.

Expected results:

Drop ACL to have least priority so it doesn't take precedence over the allow ones, because the `Deny` rule that created it is the last one.
Additional info:

Additional Info:

Relevant attachments and data will be expanded in comments.

Apart from that, I did some quick source code inspection. If I got everything correctly, when reading the egress firewall rules, those rules get an "id" based on the position at `Spec.Egress` array on the EgressFirewall object[1]. Then, that "id" is substracted to a start priority[2], so that the higher that "id" is, the lower the priority. Assuming that order is preserved properly, I honestly don't understand how this can happen, so I must be missing something.


[1] - https://github.com/openshift/ovn-kubernetes/blob/e9e0debd04b0124ba17c18483a93497efbae19be/go-controller/pkg/ovn/egressfirewall.go#L336
[2] - https://github.com/openshift/ovn-kubernetes/blob/e9e0debd04b0124ba17c18483a93497efbae19be/go-controller/pkg/ovn/egressfirewall.go#L446

Comment 18 errata-xmlrpc 2022-06-09 06:41:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.37 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4906


Note You need to log in before you can comment on or make changes to this bug.