Moving to MODIFIED, since the code in 4.9 cannot provoke the issue found in 4.8 and 4.7. What was said for the 4.10 BZ applies also here: "with respect to the original bug found in 4.7 and 4.8 (#2011666), in 4.9 the implementation of the egress firewall feature changed and the two issues found in the code in 4.7 and 4.8 are already addressed: (1) when updating an existing egress firewall, we are no longer adding and then removing a temporary ACL with external ID egressFirewall=$NS-blockAll, blocking outgoing traffic from all pods in the namespace. (2) the syncEgressFirewall method already makes sure that all egress firewall ACLs in OVN correspond to egress firewalls in the API server." In addition to that, syncEgressFirewall in 4.9 also takes care of cleanup after switching gateway mode from local to shared (shared to local is not currently supported) : it will delete any trailing ACLs that might be carried over after an upgrade from a 4.8.z in local gw mode showing this issue to a 4.9.z in shared gw mode. In light of this, verification can happen in the following way, similarly to what was suggested for the 4.10 bug but with a one more subtlety as I explain below. In order to verify that this issue is solved in 4.9, we should make sure that all EgressFirewall ACLs at startup correspond to actual EgressFirewalls in the API server. So let's cover two cases: *** case 1: no EgressFirewall, shared gw mode - ssh into ovn-k master and add spurious ACLs to node logical switches (as in local gateway mode) and to the join switch (as in shared gateway mode): ovn-nbctl --id=@acl create acl action=drop direction=to-lport priority=10000 match="1.2.3.0/24" external-ids:egressFirewall=default-blockAll -- add logical_switch ovn-worker acls @acl ovn-nbctl --id=@acl create acl action=drop direction=to-lport priority=10000 match="1.2.3.0/24" external-ids:egressFirewall=default-blockAll -- add logical_switch ovn-worker2 acls @acl ovn-nbctl --id=@acl create acl action=drop direction=to-lport priority=10000 match="1.2.3.0/24" external-ids:egressFirewall=default-blockAll -- add logical_switch ovn-control-plane acls @acl ovn-nbctl --id=@acl create acl action=drop direction=to-lport priority=10000 match="1.2.3.0/24" external-ids:egressFirewall=default-blockAll -- add logical_switch join acls @acl In the example above, I used "ovn-worker", "ovn-worker2", "ovn-control-plane" as node switches; also, I added ACLs for node logical switches too, simulating ACLs carried over from an upgrade + gw mode switch from local to shared. - delete ovn-k master pod - wait for ovn-k master pod to be up again, then ssh into ovn-k master and verify that the ACLs from above have been deleted: ovn-nbctl list acls ovn-nbctl acl-list $node ovn-nbctl acl-list join *** case 2: with an EgressFirewall, shared gw mode - add a simple egressfirewall, like: $ cat ef.yaml apiVersion: k8s.ovn.org/v1 kind: EgressFirewall metadata: name: default spec: egress: - type: Allow to: cidrSelector: 8.8.8.0/24 - type: Allow to: dnsName: github.com - type: Deny to: cidrSelector: 0.0.0.0/0 $ kubectl apply -f ef.yaml Lastly, we can repeat the steps above for local gateway mode, keeping in mind that: - moving from shared to local is not currently allowed - consequently, spurious ACLs on the "join" switch cannot be carried over from shared gw mode if we are currently in local gw mode. In this scenario, we can simply test that spurious ACLs on node logical switches (thus excluding the join switch) get deleted after a restart: ovn-nbctl --id=@acl create acl action=drop direction=to-lport priority=10000 match="1.2.3.0/24" external-ids:egressFirewall=default-blockAll -- add logical_switch ovn-worker acls @acl ovn-nbctl --id=@acl create acl action=drop direction=to-lport priority=10000 match="1.2.3.0/24" external-ids:egressFirewall=default-blockAll -- add logical_switch ovn-worker2 acls @acl ovn-nbctl --id=@acl create acl action=drop direction=to-lport priority=10000 match="1.2.3.0/24" external-ids:egressFirewall=default-blockAll -- add logical_switch ovn-control-plane acls @acl
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.9.11 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5003