Bug 2034144

Summary: [OVN AWS] ovn-kube egress IP monitoring cannot detect the failure on ovn-k8s-mp0
Product: OpenShift Container Platform Reporter: huirwang
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: ovn-kubernetes QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: dbrahane, jechen, surya
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:35:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description huirwang 2021-12-20 08:12:57 UTC
Description of problem:
We used to have a bug for ovn egressip https://bugzilla.redhat.com/show_bug.cgi?id=2002657 and got fixed with PR https://github.com/ovn-org/ovn-kubernetes/pull/2495

From PR's comments,  ovn-k is using the ip address assigned to ovn-k8s-mp0 as the live detection ip for EgressIP.
Following the verification steps in above bug, it didn't work in AWS OVN cluster.


Version-Release number of selected component (if applicable):
4.10.0-0.ci-2021-12-19-184945

How reproducible:
Always

Steps to Reproduce:
1. Tag one node as egress node ip-10-0-73-231.us-east-2.compute.internal 
2. Create one egressip object
 oc get egressip 
NAME        EGRESSIPS     ASSIGNED NODE                               ASSIGNED EGRESSIPS
egressip1   10.0.73.235   ip-10-0-73-231.us-east-2.compute.internal   10.0.73.235
3. In node ip-10-0-73-231.us-east-2.compute.internal, add one iptable rule
iptables -A INPUT -i ovn-k8s-mp0 -p tcp --destination-port 9 -j DROP

 oc debug node/ip-10-0-73-231.us-east-2.compute.internal
Starting pod/ip-10-0-73-231us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.73.231
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# 
sh-4.4# iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    KUBE-FIREWALL  all  --  anywhere             anywhere            
sh-4.4# 
sh-4.4# iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    KUBE-FIREWALL  all  --  anywhere             anywhere            
sh-4.4# iptables -A INPUT -i ovn-k8s-mp0 -p tcp --destination-port 9 -j DROP
sh-4.4# iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    KUBE-FIREWALL  all  --  anywhere             anywhere            
2    DROP       tcp  --  anywhere             anywhere             tcp dpt:discard

4. Check egressip object 

Actual results:
The controller didn't detect the failure, the EgressIP was still assigned to that node. 
$ oc get egressip
NAME        EGRESSIPS     ASSIGNED NODE                               ASSIGNED EGRESSIPS
egressip1   10.0.73.235   ip-10-0-73-231.us-east-2.compute.internal   10.0.73.235

Expected results:
The controller should detect the failure, and reassign the egressip.

Additional info:

Comment 2 Alexander Constantinescu 2022-01-10 11:59:48 UTC
*** Bug 2038840 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2022-03-10 16:35:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056