Bug 1972481 - [OVN] ARP Response from multiple Nodes for single EgressIP
Summary: [OVN] ARP Response from multiple Nodes for single EgressIP
Keywords:
Status: CLOSED DUPLICATE of bug 1976215
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: All
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Alexander Constantinescu
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-16 03:10 UTC by Michael Washer
Modified: 2021-08-06 05:31 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-02 14:11:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Michael Washer 2021-06-16 03:10:57 UTC
Description of problem:
We have two egress IP's assigned to a project and three nodes in the cluster labelled with the egress-assignable label. We are expecting that the MAC address for any given egress IP is that of the assigned node. However, we have noted that multiple nodes are responding to ARP requests for the same egress IP address. This is causing ‘flapping’ in the ARP tables.

Version-Release number of selected component (if applicable):
OpenShift 4.6.30
OCP cluster is UPI with VMware VM's provisioned as the OCP nodes. OCP cluster is using OVN-Kubernetes.
Vmware version is 6.7 and network layer is NSX-T. 
NSX-T is a tenant of Cisco ACI (version 4.2(3j)) environment.

How reproducible:
The problem occurs intermittently. We have noted that this happens more frequently after a node crashes.

Steps to Reproduce:
1) Install cluster with OVN-Kubernetes matching the environment described above
2) Create a number of Pods and allocate EgressIPs according to the description
3) Crash a Node
4) Inspect the Northbound DB and there are excess rules that for EgressIP that do not align with the OpenShift state

Actual results:
Multiple nodes are responding to ARP requests

Expected results:
Only the nodes with current ownership of EgressIPs should respond to ARP requests for the given IP 

Additional info:
We can see the following rules in the NBDB database dump where logical_port shows attachment to two different logical routers. This was reproduced in a lab environment.
```
NAT table
_uuid                                external_ids          external_ip      external_mac        external_port_range logical_ip    logical_port               options             type
------------------------------------ --------------------- ---------------- ------------------- ------------------- ------------- -------------------------- ------------------- -------------
fabe9b46-672c-48ee-ab36-f2e612710290 {name=egressips-prod} "172.21.104.123" []                  ""                  "10.128.2.22" k8s-uat-tjp8f-worker-9bh8w {}                  snat
3a09e5bc-f0cd-4587-a432-99316cc813d9 {name=egressips-prod} "172.21.104.123" []                  ""                  "10.128.2.5"  k8s-uat-tjp8f-worker-r4qbl {}                  snat
```

Comment 8 Alexander Constantinescu 2021-07-02 14:11:54 UTC

*** This bug has been marked as a duplicate of bug 1976215 ***


Note You need to log in before you can comment on or make changes to this bug.