Bug 2104784 - Some EgressIP was not correctly assigned to the egress node under some condition
Summary: Some EgressIP was not correctly assigned to the egress node under some condition
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.12.0
Assignee: Andreas Karis
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-07 06:40 UTC by huirwang
Modified: 2023-01-17 19:52 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:51:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cloud-network-config-controller pull 50 0 None open Bug 2104784: AWS: Fix race in IP address assignment code 2022-07-21 10:13:34 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:52:07 UTC

Description huirwang 2022-07-07 06:40:32 UTC
Description of problem:
Tested on AWS, the node capacity of IPv4 is 14, then created 15 egress objects, each object is one egressIP, expected 14 egressIPs works normally, however one or more egressIPs not correctly assigned. This is frequently happening in auto run case.

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-07-06-145812

How reproducible:


Steps to Reproduce:
1. Label one node as egress node, the node can have max 14 egressIPs 
[{"interface":"eni-0dd3af220dd298c34","ifaddr":{"ipv4":"10.0.48.0/20"},"capacity":{"ipv4":14,"ipv6":15}}]
2. Created 15 egressIP objects, each object had one egressIP


Actual results:
13 EgressIPs assigned correctly.
 oc get egressip
NAME                EGRESSIPS     ASSIGNED NODE                               ASSIGNED EGRESSIPS
egressip-47208-0    10.0.54.111   ip-10-0-51-130.us-east-2.compute.internal   10.0.54.111
egressip-47208-1    10.0.61.10    ip-10-0-51-130.us-east-2.compute.internal   10.0.61.10
egressip-47208-10   10.0.57.144   ip-10-0-51-130.us-east-2.compute.internal   10.0.57.144
egressip-47208-11   10.0.55.193   ip-10-0-51-130.us-east-2.compute.internal   10.0.55.193
egressip-47208-12   10.0.58.151   ip-10-0-51-130.us-east-2.compute.internal   10.0.58.151
egressip-47208-13   10.0.56.170                                               
egressip-47208-14   10.0.63.18                                                
egressip-47208-2    10.0.53.128   ip-10-0-51-130.us-east-2.compute.internal   10.0.53.128
egressip-47208-3    10.0.62.10    ip-10-0-51-130.us-east-2.compute.internal   10.0.62.10
egressip-47208-4    10.0.62.36    ip-10-0-51-130.us-east-2.compute.internal   10.0.62.36
egressip-47208-5    10.0.63.127   ip-10-0-51-130.us-east-2.compute.internal   10.0.63.127
egressip-47208-6    10.0.57.55    ip-10-0-51-130.us-east-2.compute.internal   10.0.57.55
egressip-47208-7    10.0.54.221   ip-10-0-51-130.us-east-2.compute.internal   10.0.54.221
egressip-47208-8    10.0.55.101   ip-10-0-51-130.us-east-2.compute.internal   10.0.55.101
egressip-47208-9    10.0.51.134   ip-10-0-51-130.us-east-2.compute.internal   10.0.51.134

$  oc get  cloudprivateipconfig     // 14 IPs with one is in error status
NAME          AGE
10.0.51.134   126m
10.0.53.128   127m
10.0.54.111   128m
10.0.54.221   127m
10.0.55.101   126m
10.0.55.193   126m
10.0.56.170   125m
10.0.57.144   126m
10.0.57.55    127m
10.0.58.151   126m
10.0.61.10    128m
10.0.62.10    127m
10.0.62.36    127m
10.0.63.127   127m

$ oc get  cloudprivateipconfig 10.0.56.170 -o yaml
apiVersion: cloud.network.openshift.io/v1
kind: CloudPrivateIPConfig
metadata:
  annotations:
    k8s.ovn.org/egressip-owner-ref: egressip-47208-13
  creationTimestamp: "2022-07-07T04:16:52Z"
  finalizers:
  - cloudprivateipconfig.cloud.network.openshift.io/finalizer
  generation: 1
  name: 10.0.56.170
  resourceVersion: "83557"
  uid: b6865ae9-0720-4be8-9cf4-2e57a5185ce3
spec:
  node: ip-10-0-51-130.us-east-2.compute.internal
status:
  conditions:
  - lastTransitionTime: "2022-07-07T04:22:26Z"
    message: 'Error processing cloud assignment request, err: <nil>'
    observedGeneration: 1
    reason: CloudResponseError
    status: "False"
    type: Assigned
  node: ip-10-0-51-130.us-east-2.compute.internal


Expected results:
No egressIP was in above error status.  15 egressIP objects should get 14 EgressIPs assigned correctly in above situation.

Additional info:

Comment 3 Andreas Karis 2022-07-11 14:13:00 UTC
From the logs:

2022-07-07T04:16:43.231256399Z I0707 04:16:43.231215       1 controller.go:160] Dropping key '10.0.58.151' from the cloud-private-ip-config workqueue
2022-07-07T04:16:52.761114635Z I0707 04:16:52.761065       1 controller.go:182] Assigning key: 10.0.56.170 to cloud-private-ip-config workqueue
2022-07-07T04:16:52.764943395Z I0707 04:16:52.764914       1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.56.170" will be added to node: "ip-10-0-51-130.us-east-2.compute.internal"
2022-07-07T04:16:52.772128989Z I0707 04:16:52.772096       1 cloudprivateipconfig_controller.go:295] Adding finalizer to CloudPrivateIPConfig: "10.0.56.170"
2022-07-07T04:16:52.772128989Z I0707 04:16:52.772117       1 controller.go:182] Assigning key: 10.0.56.170 to cloud-private-ip-config workqueue
2022-07-07T04:16:53.223669341Z E0707 04:16:53.223624       1 controller.go:165] error syncing '10.0.56.170': error assigning CloudPrivateIPConfig: "10.0.56.170" to node: "ip-10-0-51-130.us-east-2.compute.internal", err: PrivateIpAddressLimitExceeded: Number of private addresses will exceed limit.
2022-07-07T04:16:53.223669341Z 	status code: 400, request id: 0110a9ad-57f4-485a-a467-24866c3bdc59, requeuing in cloud-private-ip-config workqueue
2022-07-07T04:16:53.226454229Z I0707 04:16:53.226427       1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.56.170" will be added to node: "ip-10-0-51-130.us-east-2.compute.internal"
2022-07-07T04:16:53.639710866Z E0707 04:16:53.639672       1 controller.go:165] error syncing '10.0.56.170': error assigning CloudPrivateIPConfig: "10.0.56.170" to node: "ip-10-0-51-130.us-east-2.compute.internal", err: PrivateIpAddressLimitExceeded: Number of private addresses will exceed limit.
(...)
2022-07-07T04:18:19.657285557Z 	status code: 400, request id: 372b6951-6f2c-468b-8ec7-03c069a818e0, requeuing in cloud-private-ip-config workqueue
2022-07-07T04:19:41.583437183Z I0707 04:19:41.583392       1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.56.170" will be added to node: "ip-10-0-51-130.us-east-2.compute.internal"
2022-07-07T04:19:42.039827916Z E0707 04:19:42.039785       1 controller.go:165] error syncing '10.0.56.170': error assigning CloudPrivateIPConfig: "10.0.56.170" to node: "ip-10-0-51-130.us-east-2.compute.internal", err: PrivateIpAddressLimitExceeded: Number of private addresses will exceed limit.
2022-07-07T04:19:42.039827916Z 	status code: 400, request id: 9c765988-b831-4b81-9e0f-e06f425ad784, requeuing in cloud-private-ip-config workqueue
2022-07-07T04:22:25.885270338Z I0707 04:22:25.885205       1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.56.170" will be added to node: "ip-10-0-51-130.us-east-2.compute.internal"
2022-07-07T04:22:26.394088658Z I0707 04:22:26.394056       1 controller.go:160] Dropping key '10.0.56.170' from the cloud-private-ip-config workqueue

[akaris@linux must-gather.local.4851834340043511279]$

Comment 4 Andreas Karis 2022-07-11 14:48:27 UTC
Ok, according to the CloudProviderIntf:
https://github.com/openshift/cloud-network-config-controller/blob/7a3c3c9ea200d11db5128663deade53f20105ac9/pkg/cloudprovider/cloudprovider.go#L51
~~~
   51     // GetNodeEgressIPConfiguration retrieves the egress IP configuration for
   52     // the node, following the convention the cloud uses. This means
   53     // specifically that: the IP capacity can be either hard-coded and global
   54     // for all instance types and IP families (GCP, Azure) or variable per
   55     // instance and IP family (AWS), also: the interface is either keyed by name
   56     // (GCP) or ID (Azure, AWS). Note: this function should only be called when
   57     // no egress IPs have been added to the node, it will return an incorrect
   58     // "egress IP capacity" otherwise
~~~

Having a look at the AWS doc:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI
~~~
The following table lists the maximum number of network interfaces per instance type, and the maximum number of private IPv4 addresses and IPv6 addresses per network interface. The limit for IPv6 addresses is separate from the limit for private IPv4 addresses per network interface. Not all instance types support IPv6 addressing.
~~~

So, for xlarge instances, the total is 15, minus the number of assigned IP addresses (1 in this case).
The code to determine this is here:
https://github.com/openshift/cloud-network-config-controller/blob/7a3c3c9ea200d11db5128663deade53f20105ac9/pkg/cloudprovider/aws.go#L278

The node reports the correct capacity here with 14 for IPv4
~~~
[{"interface":"eni-0dd3af220dd298c34","ifaddr":{"ipv4":"10.0.48.0/20"},"capacity":{"ipv4":14,"ipv6":15}}]
~~~

The question now is why did this already fail after 13 successful assignments ...

Comment 5 Andreas Karis 2022-07-11 17:24:29 UTC
I can't reproduce this. Can you build with the following PR: https://github.com/openshift/cloud-network-config-controller/pull/50
And reproduce it? The PR will print a list of all IPs that are assigned to the interface, hopefully this will shed some more light into this issue.

Comment 6 Andreas Karis 2022-07-11 17:26:26 UTC
* The PR will print a list of all IPs that are assigned to the interface. It will show in the logs of the cloudnetworkconfigcontroller, when the assignment fails, something like:

E0711 17:12:20.527460       1 controller.go:165] error syncing '10.0.129.24': error assigning CloudPrivateIPConfig: "10.0.129.24" to node: "ip-10-0-138-80.us-west-1.compute.internal", err: PrivateIpAddressLimitExceeded: Number of private addresses will exceed limit.
	status code: 400, request id: 534845ac-7069-4b8a-8e5e-422ff279c1fa Tried to assign the following 16 IP addresses to the interface ( 10.0.138.80 10.0.129.12 10.0.129.13 10.0.129.30 10.0.129.11 10.0.129.14 10.0.129.15 10.0.129.16 10.0.129.17 10.0.129.18 10.0.129.19 10.0.129.20 10.0.129.21 10.0.129.22 10.0.129.23 10.0.129.24), requeuing in cloud-private-ip-config workqueue

Comment 9 Andreas Karis 2022-07-12 10:10:24 UTC
We definitely have issues with the removal of CloudPrivateIPs:
~~~
[akaris@linux 2104784]$ omg logs -n openshift-cloud-network-config-controller         cloud-network-config-controller-868f8459dd-4pxwd | grep 10.0.134.71
2022-07-12T02:58:05.694389392Z I0712 02:58:05.694338       1 controller.go:182] Assigning key: 10.0.134.71 to cloud-private-ip-config workqueue
2022-07-12T02:58:05.699152680Z I0712 02:58:05.699098       1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.134.71" will be added to node: "ip-10-0-131-179.us-west-1.compute.internal"
2022-07-12T02:58:05.705947222Z I0712 02:58:05.705924       1 cloudprivateipconfig_controller.go:295] Adding finalizer to CloudPrivateIPConfig: "10.0.134.71"
2022-07-12T02:58:05.706031699Z I0712 02:58:05.705929       1 controller.go:182] Assigning key: 10.0.134.71 to cloud-private-ip-config workqueue
2022-07-12T02:58:06.242678494Z I0712 02:58:06.242642       1 cloudprivateipconfig_controller.go:353] Added IP address to node: "ip-10-0-131-179.us-west-1.compute.internal" for CloudPrivateIPConfig: "10.0.134.71"
2022-07-12T02:58:06.250567807Z I0712 02:58:06.250516       1 controller.go:160] Dropping key '10.0.134.71' from the cloud-private-ip-config workqueue
2022-07-12T02:58:06.255387829Z I0712 02:58:06.255363       1 controller.go:160] Dropping key '10.0.134.71' from the cloud-private-ip-config workqueue
2022-07-12T03:03:04.746097253Z I0712 03:03:04.746059       1 controller.go:182] Assigning key: 10.0.134.71 to cloud-private-ip-config workqueue
2022-07-12T03:03:04.751467833Z I0712 03:03:04.751437       1 cloudprivateipconfig_controller.go:187] CloudPrivateIPConfig: "10.0.134.71" will be deleted from node: "ip-10-0-131-179.us-west-1.compute.internal"
2022-07-12T03:03:04.759119459Z I0712 03:03:04.759089       1 controller.go:182] Assigning key: 10.0.134.71 to cloud-private-ip-config workqueue
2022-07-12T03:03:05.346106239Z I0712 03:03:05.346085       1 cloudprivateipconfig_controller.go:242] CloudPrivateIPConfig: 10.0.134.71 object has been marked for complete deletion
2022-07-12T03:03:05.346106239Z I0712 03:03:05.346099       1 cloudprivateipconfig_controller.go:249] Cleaning up IP address and finalizer for CloudPrivateIPConfig: "10.0.134.71", deleting it completely
2022-07-12T03:03:05.356521574Z I0712 03:03:05.356500       1 controller.go:160] Dropping key '10.0.134.71' from the cloud-private-ip-config workqueue
2022-07-12T03:03:05.356705007Z I0712 03:03:05.356677       1 controller.go:182] Assigning key: 10.0.134.71 to cloud-private-ip-config workqueue
2022-07-12T03:03:05.414990428Z I0712 03:03:05.414954       1 cloudprivateipconfig_controller.go:421] CloudPrivateIPConfig: "10.0.134.71" in work queue no longer exists
2022-07-12T03:03:05.414990428Z I0712 03:03:05.414980       1 controller.go:160] Dropping key '10.0.134.71' from the cloud-private-ip-config workqueue
2022-07-12T03:03:05.614967217Z I0712 03:03:05.614929       1 cloudprivateipconfig_controller.go:421] CloudPrivateIPConfig: "10.0.134.71" in work queue no longer exists
2022-07-12T03:03:05.614967217Z I0712 03:03:05.614949       1 controller.go:160] Dropping key '10.0.134.71' from the cloud-private-ip-config workqueue
2022-07-12T03:10:41.982974676Z 	status code: 400, request id: 518e3246-1789-4c3a-9b87-02e6898d4775 Tried to assign the following 16 IP addresses to the interface ( 10.0.131.179 10.0.134.71 10.0.147.253 10.0.178.90 10.0.173.236 10.0.163.109 10.0.188.72 10.0.177.40 10.0.146.203 10.0.185.124 10.0.128.168 10.0.162.26 10.0.132.65 10.0.190.63 10.0.156.49 10.0.161.24), requeuing in cloud-private-ip-config workqueue
2022-07-12T03:10:42.451270498Z 	status code: 400, request id: 061931b4-3618-4143-afa3-afa6cf11c55e Tried to assign the following 16 IP addresses to the interface ( 10.0.131.179 10.0.134.71 10.0.147.253 10.0.178.90 10.0.173.236 10.0.163.109 10.0.188.72 10.0.177.40 10.0.146.203 10.0.185.124 10.0.128.168 10.0.162.26 10.0.132.65 10.0.190.63 10.0.156.49 10.0.161.24), requeuing in cloud-private-ip-config workqueue
(...)
2022-07-12T03:40:31.860759774Z 	status code: 400, request id: 2dd4d823-1616-4aa2-a090-ef13c56629ca Tried to assign the following 16 IP addresses to the interface ( 10.0.131.179 10.0.134.71 10.0.132.65 10.0.131.110 10.0.131.11 10.0.131.111 10.0.131.112 10.0.131.113 10.0.131.115 10.0.131.17 10.0.131.114 10.0.131.15 10.0.131.18 10.0.131.16 10.0.131.12 10.0.131.13), requeuing in cloud-private-ip-config workqueue
~~~

Comment 10 Andreas Karis 2022-07-12 10:40:00 UTC
There's a clear race between deletion and assignment. Whereas deletion releases a single IP address from the interface, the current assignment code takes all IPs on the interface and tries to reassign them. This lead to a race for example here, meaning that the release operation for 10.0.134.71 was reported as success, but the concurrent add operation of IP 10.0.186.7 reverted that delete operation:
~~~
2022-07-12T03:03:04.746097253Z I0712 03:03:04.746059       1 controller.go:182] Assigning key: 10.0.134.71 to cloud-private-ip-config workqueue
2022-07-12T03:03:04.751467833Z I0712 03:03:04.751437       1 cloudprivateipconfig_controller.go:187] CloudPrivateIPConfig: "10.0.134.71" will be deleted from node: "ip-10-0-131-179.us-west-1.compute.internal"
2022-07-12T03:03:04.759119459Z I0712 03:03:04.759089       1 controller.go:182] Assigning key: 10.0.134.71 to cloud-private-ip-config workqueue
2022-07-12T03:03:04.917985954Z I0712 03:03:04.917933       1 controller.go:182] Assigning key: 10.0.173.53 to cloud-private-ip-config workqueue
2022-07-12T03:03:04.924395572Z I0712 03:03:04.924345       1 cloudprivateipconfig_controller.go:187] CloudPrivateIPConfig: "10.0.173.53" will be deleted from node: "ip-10-0-131-179.us-west-1.compute.internal"
2022-07-12T03:03:04.926813866Z I0712 03:03:04.926786       1 cloudprivateipconfig_controller.go:242] CloudPrivateIPConfig: 10.0.172.68 object has been marked for complete deletion
2022-07-12T03:03:04.926813866Z I0712 03:03:04.926802       1 cloudprivateipconfig_controller.go:249] Cleaning up IP address and finalizer for CloudPrivateIPConfig: "10.0.172.68", deleting it completely
2022-07-12T03:03:04.931928216Z I0712 03:03:04.931905       1 controller.go:182] Assigning key: 10.0.173.53 to cloud-private-ip-config workqueue
2022-07-12T03:03:04.936263208Z I0712 03:03:04.936245       1 controller.go:182] Assigning key: 10.0.172.68 to cloud-private-ip-config workqueue
2022-07-12T03:03:04.936382862Z I0712 03:03:04.936367       1 controller.go:160] Dropping key '10.0.172.68' from the cloud-private-ip-config workqueue
2022-07-12T03:03:04.941801797Z I0712 03:03:04.941775       1 cloudprivateipconfig_controller.go:421] CloudPrivateIPConfig: "10.0.172.68" in work queue no longer exists
2022-07-12T03:03:04.941801797Z I0712 03:03:04.941794       1 controller.go:160] Dropping key '10.0.172.68' from the cloud-private-ip-config workqueue
2022-07-12T03:03:04.952650783Z I0712 03:03:04.952021       1 controller.go:182] Assigning key: 10.0.186.7 to cloud-private-ip-config workqueue
2022-07-12T03:03:04.956496344Z I0712 03:03:04.956475       1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.186.7" will be added to node: "ip-10-0-131-179.us-west-1.compute.internal"
2022-07-12T03:03:04.962999877Z I0712 03:03:04.962978       1 cloudprivateipconfig_controller.go:295] Adding finalizer to CloudPrivateIPConfig: "10.0.186.7"
2022-07-12T03:03:04.963024237Z I0712 03:03:04.963009       1 controller.go:182] Assigning key: 10.0.186.7 to cloud-private-ip-config workqueue
2022-07-12T03:03:05.344672951Z I0712 03:03:05.344635       1 controller.go:182] Assigning key: 10.0.144.60 to cloud-private-ip-config workqueue
2022-07-12T03:03:05.346106239Z I0712 03:03:05.346085       1 cloudprivateipconfig_controller.go:242] CloudPrivateIPConfig: 10.0.134.71 object has been marked for complete deletion
2022-07-12T03:03:05.346106239Z I0712 03:03:05.346099       1 cloudprivateipconfig_controller.go:249] Cleaning up IP address and finalizer for CloudPrivateIPConfig: "10.0.134.71", deleting it completely
2022-07-12T03:03:05.349547648Z I0712 03:03:05.349506       1 cloudprivateipconfig_controller.go:187] CloudPrivateIPConfig: "10.0.144.60" will be deleted from node: "ip-10-0-131-179.us-west-1.compute.internal"
2022-07-12T03:03:05.356521574Z I0712 03:03:05.356500       1 controller.go:160] Dropping key '10.0.134.71' from the cloud-private-ip-config workqueue
2022-07-12T03:03:05.356705007Z I0712 03:03:05.356677       1 controller.go:182] Assigning key: 10.0.134.71 to cloud-private-ip-config workqueue
2022-07-12T03:03:05.356860074Z I0712 03:03:05.356840       1 controller.go:182] Assigning key: 10.0.144.60 to cloud-private-ip-config workqueue
2022-07-12T03:03:05.414990428Z I0712 03:03:05.414954       1 cloudprivateipconfig_controller.go:421] CloudPrivateIPConfig: "10.0.134.71" in work queue no longer exists
2022-07-12T03:03:05.414990428Z I0712 03:03:05.414980       1 controller.go:160] Dropping key '10.0.134.71' from the cloud-private-ip-config workqueue
2022-07-12T03:03:05.500962957Z I0712 03:03:05.500923       1 cloudprivateipconfig_controller.go:353] Added IP address to node: "ip-10-0-131-179.us-west-1.compute.internal" for CloudPrivateIPConfig: "10.0.186.7"
2022-07-12T03:03:05.614967217Z I0712 03:03:05.614929       1 cloudprivateipconfig_controller.go:421] CloudPrivateIPConfig: "10.0.134.71" in work queue no longer exists
2022-07-12T03:03:05.614967217Z I0712 03:03:05.614949       1 controller.go:160] Dropping key '10.0.134.71' from the cloud-private-ip-config workqueue
2022-07-12T03:03:05.631178307Z I0712 03:03:05.631133       1 cloudprivateipconfig_controller.go:242] CloudPrivateIPConfig: 10.0.173.53 object has been marked for complete deletion
2022-07-12T03:03:05.631178307Z I0712 03:03:05.631161       1 cloudprivateipconfig_controller.go:249] Cleaning up IP address and finalizer for CloudPrivateIPConfig: "10.0.173.53", deleting it completely
2022-07-12T03:03:05.682396046Z I0712 03:03:05.682349       1 controller.go:182] Assigning key: 10.0.152.209 to cloud-private-ip-config workqueue
2022-07-12T03:03:05.818649332Z I0712 03:03:05.818610       1 controller.go:160] Dropping key '10.0.186.7' from the cloud-private-ip-config workqueue
~~~

Comment 11 Andreas Karis 2022-07-12 12:39:46 UTC
I added some debug logging [0]

Then, I ran:
~~~
[akaris@linux 2104784]$ while true; echo ===================; echo setup;  do oc apply -f egressip11.yaml ;  oc apply -f egressip12.yaml ; sleep 10 ; echo test ; oc apply -f egressip13.yaml & sleep 0.3;  oc delete -f egressip12.yaml & sleep 10; echo teardown;   oc delete -f egressip11.yaml ; oc delete -f egressip13.yaml; sleep 10; done
===================
setup
egressip.k8s.ovn.org/egressip11 created
egressip.k8s.ovn.org/egressip12 created
test
[1] 210367
[2] 210380
egressip.k8s.ovn.org "egressip12" deleted
egressip.k8s.ovn.org/egressip13 created
[1]-  Done                    oc apply -f egressip13.yaml
[2]+  Done                    oc delete -f egressip12.yaml
teardown
egressip.k8s.ovn.org "egressip11" deleted
egressip.k8s.ovn.org "egressip13" deleted
^C
~~~

And I can reproduce the problem (output after the last iteration):
~~~
W0712 12:38:47.594980       1 aws.go:243] akaris --------- Polling interface for node ip-10-0-132-121.us-west-1.compute.internal, assigned IPs: [10.0.132.121 10.0.129.12]
~~~

======================

[0]
~~~
diff --git a/pkg/cloudprovider/aws.go b/pkg/cloudprovider/aws.go
index 2f031ac..0e5bc1a 100644
--- a/pkg/cloudprovider/aws.go
+++ b/pkg/cloudprovider/aws.go
@@ -59,6 +59,7 @@ func (a *AWS) initCredentials() error {
 // AWS API is separated per family). If the IP is already existing: it returns an
 // AlreadyExistingIPError.
 func (a *AWS) AssignPrivateIP(ip net.IP, node *corev1.Node) error {
+       klog.Warningf("akaris --------- Addding IP %s to node %s", ip, node.Name)
        instance, err := a.getInstance(node)
        if err != nil {
                return err
@@ -106,6 +107,7 @@ func (a *AWS) AssignPrivateIP(ip net.IP, node *corev1.Node) error {
        } else {
                for _, assignedIPv4 := range networkInterface.PrivateIpAddresses {
                        if assignedIP := net.ParseIP(*assignedIPv4.PrivateIpAddress); assignedIP != nil && assignedIP.Equal(ip) {
+                               klog.Warningf("akaris --------- Already existing IP %s on node %s", ip, node.Name)
                                return AlreadyExistingIPError
                        }
                        keepIPs = append(keepIPs, assignedIPv4.PrivateIpAddress)
@@ -116,6 +118,7 @@ func (a *AWS) AssignPrivateIP(ip net.IP, node *corev1.Node) error {
                        PrivateIpAddresses: newIPs,
                }
                _, err = a.client.AssignPrivateIpAddresses(&inputV4)
+               klog.Warningf("akaris --------- Unassigning IP %s on node %s, inputV4: %v", ip, node.Name, inputV4)
                if err != nil {
                        var newIPsStr string
                        var keepIPsStr string
@@ -140,6 +143,7 @@ func (a *AWS) AssignPrivateIP(ip net.IP, node *corev1.Node) error {
 // per-IP-family basis (since the AWS API is separated per family).  If the IP
 // is non-existant: it returns an NonExistingIPError.
 func (a *AWS) ReleasePrivateIP(ip net.IP, node *corev1.Node) error {
+       klog.Warningf("akaris --------- Removing IP %s from node %s", ip, node.Name)
        instance, err := a.getInstance(node)
        if err != nil {
                return err
@@ -177,6 +181,7 @@ func (a *AWS) ReleasePrivateIP(ip net.IP, node *corev1.Node) error {
                        }
                }
                if len(deleteIPs) == 0 {
+                       klog.Warningf("akaris --------- Non existing IP %s on node %s", ip, node.Name)
                        return NonExistingIPError
                }
                inputV4 := ec2.UnassignPrivateIpAddressesInput{
@@ -185,8 +190,10 @@ func (a *AWS) ReleasePrivateIP(ip net.IP, node *corev1.Node) error {
                }
                _, err = a.client.UnassignPrivateIpAddresses(&inputV4)
                if err != nil {
+                       klog.Warningf("akaris --------- Error in unassing IP %s on node %s", ip, node.Name)
                        return err
                }
+               klog.Warningf("akaris --------- Unassigning IP %s on node %s, inputV4: %v", ip, node.Name, inputV4)
                return a.waitForCompletion(node, awsapi.StringValueSlice(deleteIPs), true)
        }
 }
@@ -255,6 +262,7 @@ func (a *AWS) waitForCompletion(node *corev1.Node, ips []string, deleteOp bool)
                                }
                        }
                }
+               klog.Warningf("akaris --------- Polling interface for node %s, assigned IPs: %v", node.Name, assignedIPs)
                if deleteOp {
                        return !sets.NewString(assignedIPs...).HasAny(ips...), nil
                } else {
~~~

Comment 19 errata-xmlrpc 2023-01-17 19:51:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.