Bug 2034144 - [OVN AWS] ovn-kube egress IP monitoring cannot detect the failure on ovn-k8s-mp0
Summary: [OVN AWS] ovn-kube egress IP monitoring cannot detect the failure on ovn-k8s-mp0
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.10.0
Assignee: Ben Bennett
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-20 08:12 UTC by huirwang
Modified: 2022-03-10 16:35 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:35:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cloud-network-config-controller pull 12 0 None open Bug 2034144: Don't enqueue CloudPrivateIPConfig on delete 2021-12-20 21:50:18 UTC
Github openshift ovn-kubernetes pull 917 0 None open Bug 2039099: EgressIP fixes for 4.10 2022-01-19 21:14:59 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:35:21 UTC

Description huirwang 2021-12-20 08:12:57 UTC
Description of problem:
We used to have a bug for ovn egressip https://bugzilla.redhat.com/show_bug.cgi?id=2002657 and got fixed with PR https://github.com/ovn-org/ovn-kubernetes/pull/2495

From PR's comments,  ovn-k is using the ip address assigned to ovn-k8s-mp0 as the live detection ip for EgressIP.
Following the verification steps in above bug, it didn't work in AWS OVN cluster.


Version-Release number of selected component (if applicable):
4.10.0-0.ci-2021-12-19-184945

How reproducible:
Always

Steps to Reproduce:
1. Tag one node as egress node ip-10-0-73-231.us-east-2.compute.internal 
2. Create one egressip object
 oc get egressip 
NAME        EGRESSIPS     ASSIGNED NODE                               ASSIGNED EGRESSIPS
egressip1   10.0.73.235   ip-10-0-73-231.us-east-2.compute.internal   10.0.73.235
3. In node ip-10-0-73-231.us-east-2.compute.internal, add one iptable rule
iptables -A INPUT -i ovn-k8s-mp0 -p tcp --destination-port 9 -j DROP

 oc debug node/ip-10-0-73-231.us-east-2.compute.internal
Starting pod/ip-10-0-73-231us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.73.231
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# 
sh-4.4# iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    KUBE-FIREWALL  all  --  anywhere             anywhere            
sh-4.4# 
sh-4.4# iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    KUBE-FIREWALL  all  --  anywhere             anywhere            
sh-4.4# iptables -A INPUT -i ovn-k8s-mp0 -p tcp --destination-port 9 -j DROP
sh-4.4# iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    KUBE-FIREWALL  all  --  anywhere             anywhere            
2    DROP       tcp  --  anywhere             anywhere             tcp dpt:discard

4. Check egressip object 

Actual results:
The controller didn't detect the failure, the EgressIP was still assigned to that node. 
$ oc get egressip
NAME        EGRESSIPS     ASSIGNED NODE                               ASSIGNED EGRESSIPS
egressip1   10.0.73.235   ip-10-0-73-231.us-east-2.compute.internal   10.0.73.235

Expected results:
The controller should detect the failure, and reassign the egressip.

Additional info:

Comment 2 Alexander Constantinescu 2022-01-10 11:59:48 UTC
*** Bug 2038840 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2022-03-10 16:35:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.