+++ This bug was initially created as a clone of Bug #1558484 +++ Description of problem: Every 30 minutes the egress network policies are being updated/re-written (even when there is no change to any policy). As part of the process for the update to a policy in a project a drop rule is applied to the OpenFlow tables for the project with maximum priority and then the rules are rewritten. This means that for the duration of this rewrite no egress traffic is permitted from any pods in the project and no dns lookups are permitted either. We are seeing occasions where this re-write of rules can take the order of 5/6 seconds which potentially will impact our apps The customer confirmed that redhat/ovs-multinant-pugin is configured on masters and nodes and followed the notes present on our documentation for this specific configuration: https://docs.openshift.com/container-platform/3.7/admin_guide/managing_networking.html#admin-guide-limit-pod-access-egress Expected results: Control the EgressNetworkPolicies for being updated and/or not causing application downtime, since it seems traffic stops when the policies are updated. Additional info: OCP is using vSphere Cloud Provider. I've been looking at this file: https://raw.githubusercontent.com/openshift/origin/master/api/swagger-spec/oapi-v1.json I don't see if there's any variable that would be useful to help with this or if there is something we can do to configure update timing or blocking it on the policy.json we can use to create the EgressNetworkPolicy object. Also don't know if this might be related with this issue: "Domain name updates are polled based on the TTL (time to live) value of the domain of the local non-authoritative server, or 30 minutes if the TTL is unable to be fetched. The pod should also resolve the domain from the same local non-authoritative server when necessary, otherwise the IP addresses for the domain perceived by the egress network policy controller and the pod will be different, and the egress network policy may not be enforced as expected. In the above example, suppose www.foo.com resolved to 10.11.12.13 and has a DNS TTL of one minute, but was later changed to 20.21.22.23. OpenShift Container Platform will then take up to one minute to adapt to these changes."
https://github.com/openshift/ose/pull/1230
verified in atomic-openshift-3.7.46-1.git.0.e81594b.el7 that the order of updating ovs flow has been changed as below: May 11 04:36:15 host-172-16-120-136 atomic-openshift-node[16957]: W0511 04:36:15.153777 16957 ovscontroller.go:471] Correcting CIDRSelector '0.0.0.0/32' to '0.0.0.0/0' in EgressNetworkPolicy lha:policy-test May 11 04:36:15 host-172-16-120-136 atomic-openshift-node[16957]: I0511 04:36:15.153833 16957 ovs.go:139] Executing: ovs-ofctl -O OpenFlow13 add-flow br0 table=101, reg0=14428014, cookie=1, priority=65535, actions=drop May 11 04:36:15 host-172-16-120-136 atomic-openshift-node[16957]: I0511 04:36:15.166007 16957 ovs.go:139] Executing: ovs-ofctl -O OpenFlow13 del-flows br0 table=101, reg0=14428014, cookie=0/1 May 11 04:36:15 host-172-16-120-136 atomic-openshift-node[16957]: I0511 04:36:15.173006 16957 ovs.go:139] Executing: ovs-ofctl -O OpenFlow13 add-flow br0 table=101, reg0=14428014, priority=2, ip, nw_dst=98.137.246.7, actions=output:2 May 11 04:36:15 host-172-16-120-136 atomic-openshift-node[16957]: I0511 04:36:15.179517 16957 ovs.go:139] Executing: ovs-ofctl -O OpenFlow13 add-flow br0 table=101, reg0=14428014, priority=2, ip, nw_dst=98.137.246.8, actions=output:2 May 11 04:36:15 host-172-16-120-136 atomic-openshift-node[16957]: I0511 04:36:15.185321 16957 ovs.go:139] Executing: ovs-ofctl -O OpenFlow13 add-flow br0 table=101, reg0=14428014, priority=2, ip, nw_dst=72.30.35.10, actions=output:2 May 11 04:36:15 host-172-16-120-136 atomic-openshift-node[16957]: I0511 04:36:15.191714 16957 ovs.go:139] Executing: ovs-ofctl -O OpenFlow13 add-flow br0 table=101, reg0=14428014, priority=2, ip, nw_dst=72.30.35.9, actions=output:2 May 11 04:36:15 host-172-16-120-136 atomic-openshift-node[16957]: I0511 04:36:15.198561 16957 ovs.go:139] Executing: ovs-ofctl -O OpenFlow13 add-flow br0 table=101, reg0=14428014, priority=2, ip, nw_dst=98.138.219.231, actions=output:2 May 11 04:36:15 host-172-16-120-136 atomic-openshift-node[16957]: I0511 04:36:15.204981 16957 ovs.go:139] Executing: ovs-ofctl -O OpenFlow13 add-flow br0 table=101, reg0=14428014, priority=2, ip, nw_dst=98.138.219.232, actions=output:2 May 11 04:36:15 host-172-16-120-136 atomic-openshift-node[16957]: I0511 04:36:15.211258 16957 ovs.go:139] Executing: ovs-ofctl -O OpenFlow13 add-flow br0 table=101, reg0=14428014, priority=1, ip, actions=drop May 11 04:36:15 host-172-16-120-136 atomic-openshift-node[16957]: I0511 04:36:15.217460 16957 ovs.go:139] Executing: ovs-ofctl -O OpenFlow13 del-flows br0 table=101, reg0=14428014, cookie=1/1 OS: Red Hat Enterprise Linux Server release 7.5 (Maipo) kernel: Linux host-172-16-120-136 3.10.0-862.2.3.el7.x86_64 #1 SMP Mon Apr 30 12:37:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1576