Bug 1917605 - Deleting an exgw causes pods to no longer route to other exgws
Summary: Deleting an exgw causes pods to no longer route to other exgws
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6.z
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Tim Rozet
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks: 1917608 1917609
TreeView+ depends on / blocked
 
Reported: 2021-01-18 22:20 UTC by Tim Rozet
Modified: 2021-02-24 15:54 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: All OVN router policies were accidentally being removed for a pod anytime one of its external gateways was removed. Consequence: When scaling down a pod that had multiple external gateways, the pod would no longer send egress traffic to any of the still available external gateways. Instead it would send its egress cluster traffic to the default gateway of the node. Fix: When external gateways are scaled down, only remove a pods logical_router_policy on ovn_cluster_router when it has no external gateways left. Result: Pods now work correctly with external gateways when scaling down. Egress traffic is still sent to the remaining available external gateways and not to the node's default gateway.
Clone Of:
: 1917608 1917609 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:54:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 420 0 None closed Bug 1917605: Fixes deleting exgw pod 2021-02-15 20:06:40 UTC
Github ovn-org ovn-kubernetes pull 1964 0 None closed Fixes deleting exgw pod 2021-02-15 20:06:40 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:54:41 UTC

Description Tim Rozet 2021-01-18 22:20:36 UTC
Description of problem:
Consider a scenario where multiple pods to be external gateways for pod such as:

ovn-worker1                     ovn-worker2  
pod A----OVN--eth0 ----------- External GW Pod1 (172.0.0.4)
                       |
                       |----- External GW Pod2 (172.0.0.5)
                       |
                       |------ cluster default gateway (172.0.0.1)
 

pod A now has 2 ecmp routes to 172.0.0.4, and 172.0.0.5. Now, we delete External GW Pod1. pod A should still use 172.0.0.5 as its only other ECMP gateway. Instead, we see that deleting External GW Pod1, results in a delete for the ovn_cluster_router policy for this pod A. This causes traffic from pod A to now go via the default cluster gateway (172.0.0.1) .

Comment 5 errata-xmlrpc 2021-02-24 15:54:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.