Bug 1917605

Summary: Deleting an exgw causes pods to no longer route to other exgws
Product: OpenShift Container Platform Reporter: Tim Rozet <trozet>
Component: NetworkingAssignee: Tim Rozet <trozet>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: anbhat, cpaquin, kholtz, pibanezr
Version: 4.6.z   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: All OVN router policies were accidentally being removed for a pod anytime one of its external gateways was removed. Consequence: When scaling down a pod that had multiple external gateways, the pod would no longer send egress traffic to any of the still available external gateways. Instead it would send its egress cluster traffic to the default gateway of the node. Fix: When external gateways are scaled down, only remove a pods logical_router_policy on ovn_cluster_router when it has no external gateways left. Result: Pods now work correctly with external gateways when scaling down. Egress traffic is still sent to the remaining available external gateways and not to the node's default gateway.
Story Points: ---
Clone Of:
: 1917608 1917609 (view as bug list) Environment:
Last Closed: 2021-02-24 15:54:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1917608, 1917609    

Description Tim Rozet 2021-01-18 22:20:36 UTC
Description of problem:
Consider a scenario where multiple pods to be external gateways for pod such as:

ovn-worker1                     ovn-worker2  
pod A----OVN--eth0 ----------- External GW Pod1 (172.0.0.4)
                       |
                       |----- External GW Pod2 (172.0.0.5)
                       |
                       |------ cluster default gateway (172.0.0.1)
 

pod A now has 2 ecmp routes to 172.0.0.4, and 172.0.0.5. Now, we delete External GW Pod1. pod A should still use 172.0.0.5 as its only other ECMP gateway. Instead, we see that deleting External GW Pod1, results in a delete for the ovn_cluster_router policy for this pod A. This causes traffic from pod A to now go via the default cluster gateway (172.0.0.1) .

Comment 5 errata-xmlrpc 2021-02-24 15:54:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633