Bug 2111878 - Azure EgressIP gives up reconciling with No matching nodes found when updating the same egressip consecutively
Summary: Azure EgressIP gives up reconciling with No matching nodes found when updatin...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.12
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.12.0
Assignee: Periyasamy Palanisamy
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-28 11:37 UTC by Andreas Karis
Modified: 2023-01-17 19:54 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:53:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cloud-network-config-controller pull 52 0 None open Bug 2111878: Make azure operations to be in sequence 2022-08-03 10:31:11 UTC
Github openshift ovn-kubernetes pull 1253 0 None Merged [DownstreamMerge] 8-25-2022 2022-08-29 08:54:45 UTC
Github ovn-org ovn-kubernetes pull 3105 0 None open Delete stale egress ip before assigning new ip 2022-08-03 10:29:43 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:54:27 UTC

Description Andreas Karis 2022-07-28 11:37:53 UTC
Description of problem:

This can easily be reproduced on Azure given that egressip assignment and detachment is very slow
there.

a) Assign an EgressIP to a node. 
b) Wait until everything settles.
c) Edit the egressIP object and change its IP address.
d) Watch the egressip object and cloudprivateipconfigs. The old address will be detached and the cloudprivateipconfig will be removed.
e) Wait for the exact the moment when "assigned egressIPs" is empty in the egressip status (old CNCC IP was deleted, the CNCC tries to create the new IP address already).
f) While the CNCC tries to attach the new IP, and while "assigned egressips" is still empty in the egressip's status, quickly edit the egressip object again and change its ip address to a new value.

You will see the following:

0728 11:17:56.114041       1 client.go:781]  "msg"="transacting operations"  "database"="OVN_Northbound" "operations"="[{Op:insert Table:NAT Row:map[external_ids:{GoMap:map[name:egressip]} external_ip:10.0.129.10 logical_ip:10.129.2.14 logical_port:{GoSet:[k8s-ci-ln-nlnwi9k-1d09d-w5kh9-worker-centralus1-dszz9]} options:{GoMap:map[stateless:false]} type:snat] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:u2596996597} {Op:mutate Table:Logical_Router Row:map[] Rows:[] Columns:[] Mutations:[{Column:nat Mutator:insert Value:{GoSet:[{GoUUID:u2596996597}]}}] Timeout:<nil> Where:[where column _uuid == {a846c4e9-9c8a-41fe-befb-62c864bc1e88}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]"
I0728 11:17:56.117685       1 egressip.go:1423] Patching status on EgressIP egressip: [{ci-ln-nlnwi9k-1d09d-w5kh9-worker-centralus1-dszz9 10.0.129.10}]
I0728 11:19:12.419395       1 obj_retry.go:1514] Creating *factory.egressIPPod openshift-marketplace/certified-operators-wq55n took: 28.9µs
I0728 11:20:08.422899       1 egressip.go:1423] Patching status on EgressIP egressip: []
I0728 11:20:08.430849       1 egressip.go:1539] Successful assignment of egress IP: 10.0.129.11 on node: &{egressIPConfig:0xc0030af9e0 mgmtIPs:[[10 129 2 2]] allocations:map[] isReady:true isReachable:true isEgressAssignable:true name:ci-ln-nlnwi9k-1d09d-w5kh9-worker-centralus1-dszz9}
E0728 11:20:15.158012       1 egressip.go:1551] No matching host found for EgressIP: egressip
I0728 11:20:15.158113       1 event.go:285] Event(v1.ObjectReference{Kind:"EgressIP", Namespace:"", Name:"egressip", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'NoMatchingNodeFound' No matching nodes found, which can host any of the egress IPs: [10.0.129.12] for object EgressIP: egressip
I0728 11:21:05.770252       1 egressip.go:1423] Patching status on EgressIP egressip: []
I0728 11:23:31.500358       1 egressip.go:1539] Successful assignment of egress IP: 10.0.129.13 on node: &{egressIPConfig:0xc0030af9e0 mgmtIPs:[[10 129 2 2]] allocations:map[] isReady:true isReachable:true isEgressAssignable:true name:ci-ln-nlnwi9k-1d09d-w5kh9-worker-centralus1-dszz9}




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 huirwang 2022-10-08 05:11:28 UTC
verified in  4.12.0-0.nightly-2022-10-05-053337
oc get egressip
NAME         EGRESSIPS     ASSIGNED NODE                                       ASSIGNED EGRESSIPS
egressip-2   10.0.128.11   huirwang-1008b-hv42m-worker-southcentralus1-xxl55   10.0.128.10

% oc edit egressip
egressip.k8s.ovn.org/egressip-2 edited
huirwang@huirwang-mac workspace % oc get egressip 
NAME         EGRESSIPS     ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-2   10.0.128.12 

Edit it again with new IP, egressIP was assigned correctly finally.              
 % oc edit egressip
egressip.k8s.ovn.org/egressip-2 edited
% oc get egressip 
NAME         EGRESSIPS     ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-2   10.0.128.13                   

oc get egressip
NAME         EGRESSIPS     ASSIGNED NODE                                       ASSIGNED EGRESSIPS
egressip-2   10.0.128.13   huirwang-1008b-hv42m-worker-southcentralus1-xxl55   10.0.128.13

Comment 11 errata-xmlrpc 2023-01-17 19:53:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.