Bug 2034144

Summary:	[OVN AWS] ovn-kube egress IP monitoring cannot detect the failure on ovn-k8s-mp0
Product:	OpenShift Container Platform	Reporter:	huirwang
Component:	Networking	Assignee:	Ben Bennett <bbennett>
Networking sub component:	ovn-kubernetes	QA Contact:	huirwang
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	unspecified	CC:	dbrahane, jechen, surya
Version:	4.10
Target Milestone:	---
Target Release:	4.10.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-03-10 16:35:04 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description huirwang 2021-12-20 08:12:57 UTC

Description of problem:
We used to have a bug for ovn egressip https://bugzilla.redhat.com/show_bug.cgi?id=2002657 and got fixed with PR https://github.com/ovn-org/ovn-kubernetes/pull/2495

From PR's comments,  ovn-k is using the ip address assigned to ovn-k8s-mp0 as the live detection ip for EgressIP.
Following the verification steps in above bug, it didn't work in AWS OVN cluster.


Version-Release number of selected component (if applicable):
4.10.0-0.ci-2021-12-19-184945

How reproducible:
Always

Steps to Reproduce:
1. Tag one node as egress node ip-10-0-73-231.us-east-2.compute.internal 
2. Create one egressip object
 oc get egressip 
NAME        EGRESSIPS     ASSIGNED NODE                               ASSIGNED EGRESSIPS
egressip1   10.0.73.235   ip-10-0-73-231.us-east-2.compute.internal   10.0.73.235
3. In node ip-10-0-73-231.us-east-2.compute.internal, add one iptable rule
iptables -A INPUT -i ovn-k8s-mp0 -p tcp --destination-port 9 -j DROP

 oc debug node/ip-10-0-73-231.us-east-2.compute.internal
Starting pod/ip-10-0-73-231us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.73.231
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# 
sh-4.4# iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    KUBE-FIREWALL  all  --  anywhere             anywhere            
sh-4.4# 
sh-4.4# iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    KUBE-FIREWALL  all  --  anywhere             anywhere            
sh-4.4# iptables -A INPUT -i ovn-k8s-mp0 -p tcp --destination-port 9 -j DROP
sh-4.4# iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    KUBE-FIREWALL  all  --  anywhere             anywhere            
2    DROP       tcp  --  anywhere             anywhere             tcp dpt:discard

4. Check egressip object 

Actual results:
The controller didn't detect the failure, the EgressIP was still assigned to that node. 
$ oc get egressip
NAME        EGRESSIPS     ASSIGNED NODE                               ASSIGNED EGRESSIPS
egressip1   10.0.73.235   ip-10-0-73-231.us-east-2.compute.internal   10.0.73.235

Expected results:
The controller should detect the failure, and reassign the egressip.

Additional info:

Comment 2 Alexander Constantinescu 2022-01-10 11:59:48 UTC

*** Bug 2038840 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2022-03-10 16:35:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056