Bug 2111878

Summary: Azure EgressIP gives up reconciling with No matching nodes found when updating the same egressip consecutively
Product: OpenShift Container Platform Reporter: Andreas Karis <akaris>
Component: NetworkingAssignee: Periyasamy Palanisamy <pepalani>
Networking sub component: ovn-kubernetes QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: huirwang
Version: 4.12   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:53:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andreas Karis 2022-07-28 11:37:53 UTC
Description of problem:

This can easily be reproduced on Azure given that egressip assignment and detachment is very slow
there.

a) Assign an EgressIP to a node. 
b) Wait until everything settles.
c) Edit the egressIP object and change its IP address.
d) Watch the egressip object and cloudprivateipconfigs. The old address will be detached and the cloudprivateipconfig will be removed.
e) Wait for the exact the moment when "assigned egressIPs" is empty in the egressip status (old CNCC IP was deleted, the CNCC tries to create the new IP address already).
f) While the CNCC tries to attach the new IP, and while "assigned egressips" is still empty in the egressip's status, quickly edit the egressip object again and change its ip address to a new value.

You will see the following:

0728 11:17:56.114041       1 client.go:781]  "msg"="transacting operations"  "database"="OVN_Northbound" "operations"="[{Op:insert Table:NAT Row:map[external_ids:{GoMap:map[name:egressip]} external_ip:10.0.129.10 logical_ip:10.129.2.14 logical_port:{GoSet:[k8s-ci-ln-nlnwi9k-1d09d-w5kh9-worker-centralus1-dszz9]} options:{GoMap:map[stateless:false]} type:snat] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:u2596996597} {Op:mutate Table:Logical_Router Row:map[] Rows:[] Columns:[] Mutations:[{Column:nat Mutator:insert Value:{GoSet:[{GoUUID:u2596996597}]}}] Timeout:<nil> Where:[where column _uuid == {a846c4e9-9c8a-41fe-befb-62c864bc1e88}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]"
I0728 11:17:56.117685       1 egressip.go:1423] Patching status on EgressIP egressip: [{ci-ln-nlnwi9k-1d09d-w5kh9-worker-centralus1-dszz9 10.0.129.10}]
I0728 11:19:12.419395       1 obj_retry.go:1514] Creating *factory.egressIPPod openshift-marketplace/certified-operators-wq55n took: 28.9µs
I0728 11:20:08.422899       1 egressip.go:1423] Patching status on EgressIP egressip: []
I0728 11:20:08.430849       1 egressip.go:1539] Successful assignment of egress IP: 10.0.129.11 on node: &{egressIPConfig:0xc0030af9e0 mgmtIPs:[[10 129 2 2]] allocations:map[] isReady:true isReachable:true isEgressAssignable:true name:ci-ln-nlnwi9k-1d09d-w5kh9-worker-centralus1-dszz9}
E0728 11:20:15.158012       1 egressip.go:1551] No matching host found for EgressIP: egressip
I0728 11:20:15.158113       1 event.go:285] Event(v1.ObjectReference{Kind:"EgressIP", Namespace:"", Name:"egressip", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'NoMatchingNodeFound' No matching nodes found, which can host any of the egress IPs: [10.0.129.12] for object EgressIP: egressip
I0728 11:21:05.770252       1 egressip.go:1423] Patching status on EgressIP egressip: []
I0728 11:23:31.500358       1 egressip.go:1539] Successful assignment of egress IP: 10.0.129.13 on node: &{egressIPConfig:0xc0030af9e0 mgmtIPs:[[10 129 2 2]] allocations:map[] isReady:true isReachable:true isEgressAssignable:true name:ci-ln-nlnwi9k-1d09d-w5kh9-worker-centralus1-dszz9}




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 huirwang 2022-10-08 05:11:28 UTC
verified in  4.12.0-0.nightly-2022-10-05-053337
oc get egressip
NAME         EGRESSIPS     ASSIGNED NODE                                       ASSIGNED EGRESSIPS
egressip-2   10.0.128.11   huirwang-1008b-hv42m-worker-southcentralus1-xxl55   10.0.128.10

% oc edit egressip
egressip.k8s.ovn.org/egressip-2 edited
huirwang@huirwang-mac workspace % oc get egressip 
NAME         EGRESSIPS     ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-2   10.0.128.12 

Edit it again with new IP, egressIP was assigned correctly finally.              
 % oc edit egressip
egressip.k8s.ovn.org/egressip-2 edited
% oc get egressip 
NAME         EGRESSIPS     ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-2   10.0.128.13                   

oc get egressip
NAME         EGRESSIPS     ASSIGNED NODE                                       ASSIGNED EGRESSIPS
egressip-2   10.0.128.13   huirwang-1008b-hv42m-worker-southcentralus1-xxl55   10.0.128.13

Comment 11 errata-xmlrpc 2023-01-17 19:53:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399