Description of problem: failed egressIP assignment - cloud-network-config-controller does not delete failed cloudprivateipconfig When someone creates an EgressIP resource with a valid and an invalid IP (in this example, 10.0.128.5 and 10.0.128.1), such as the gateway IP on GCP, the cncc will report: ~~~ [akaris@linux ipi-us-east-1]$ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller | grep 10.0.128.1 I0408 10:53:24.036920 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "akaris-5-2w6fk-worker-c-8tbhc.c.openshift-gce-devel.internal" E0408 10:53:25.504961 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "akaris-5-2w6fk-worker-c-8tbhc.c.openshift-gce-devel.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0408 10:54:47.433572 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "akaris-5-2w6fk-worker-c-8tbhc.c.openshift-gce-devel.internal" E0408 10:54:48.915587 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "akaris-5-2w6fk-worker-c-8tbhc.c.openshift-gce-devel.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0408 10:57:32.760968 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "akaris-5-2w6fk-worker-c-8tbhc.c.openshift-gce-devel.internal" I0408 10:57:34.295677 1 controller.go:160] Dropping key '10.0.128.1' from the cloud-private-ip-config workqueue ~~~ When we then delete the egressip resource, only the valid cloudprivateipconfig is ever cleaned up. The invalid cloudprivateipconfig will exist forever: ~~~ [akaris@linux origin (egressip-tests-option3)]$ oc get cloudprivateipconfig NAME AGE 10.0.128.1 14m [akaris@linux origin (egressip-tests-option3)]$ oc get egressip No resources found ~~~ Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
@pepalani I am not sure if this bug is fixed. out of three worker nodes, if I only mark one node egress-assignable to true. And I create egressIP object with 10.0.128.1 and 10.0.128.5 as egressip. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-05-18-171831 True False 100m Cluster version is 4.11.0-0.nightly-2022-05-18-171831 $ oc get node NAME STATUS ROLES AGE VERSION jechen-0519c-9ng2l-master-0.c.openshift-qe.internal Ready master 73m v1.23.3+69213f8 jechen-0519c-9ng2l-master-1.c.openshift-qe.internal Ready master 73m v1.23.3+69213f8 jechen-0519c-9ng2l-master-2.c.openshift-qe.internal Ready master 73m v1.23.3+69213f8 jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal Ready worker 64m v1.23.3+69213f8 jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal Ready worker 64m v1.23.3+69213f8 jechen-0519c-9ng2l-worker-c-hctm8.c.openshift-qe.internal Ready worker 64m v1.23.3+69213f8 $ oc label node jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal labeled $ cat config_egressip1_ovn_ns_team_red.yaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressip1 spec: egressIPs: - 10.0.128.1 - 10.0.128.5 namespaceSelector: matchLabels: team: red $ oc get egressips.k8s.ovn.org NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.1 jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal 10.0.128.5 # there is no error for 10.0.128.1 in cloud-network-config-controller $ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller | grep 10.0.128.1 $ $ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller I0519 16:31:35.906740 1 node_controller.go:106] Setting annotation: 'cloud.network.openshift.io/egress-ipconfig: [{"interface":"nic0","ifaddr":{"ipv4":"10.0.128.0/17"},"capacity":{"ip":10}}]' on node: jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal I0519 16:31:35.925541 1 controller.go:160] Dropping key 'jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal' from the node workqueue I0519 16:31:36.028513 1 controller.go:160] Dropping key 'jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal' from the node workqueue I0519 17:32:23.706146 1 controller.go:182] Assigning key: 10.0.128.5 to cloud-private-ip-config workqueue I0519 17:32:23.711977 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.5" will be added to node: "jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal" I0519 17:32:23.725078 1 controller.go:182] Assigning key: 10.0.128.5 to cloud-private-ip-config workqueue I0519 17:32:23.725541 1 cloudprivateipconfig_controller.go:295] Adding finalizer to CloudPrivateIPConfig: "10.0.128.5" I0519 17:32:26.598123 1 cloudprivateipconfig_controller.go:353] Added IP address to node: "jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal" for CloudPrivateIPConfig: "10.0.128.5" I0519 17:32:26.607988 1 controller.go:160] Dropping key '10.0.128.5' from the cloud-private-ip-config workqueue I0519 17:32:26.612368 1 controller.go:160] Dropping key '10.0.128.5' from the cloud-private-ip-config workqueue $ oc delete egressips.k8s.ovn.org egressip1 egressip.k8s.ovn.org "egressip1" deleted $ oc get egressips.k8s.ovn.org No resources found $ oc get cloudprivateipconfig No resources found However, if make a second node egress-assignable true $ oc label node jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal labeled # create the egressip object back $ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml egressip.k8s.ovn.org/egressip1 created $ oc get egressips.k8s.ovn.org NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.1 jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal 10.0.128.5 $ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller I0519 17:35:57.747989 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" E0519 17:35:58.999335 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0519 17:35:59.004272 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" E0519 17:36:00.129698 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0519 17:36:00.135605 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" E0519 17:36:01.179277 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0519 17:36:01.184189 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" E0519 17:36:02.370066 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0519 17:36:02.471125 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" E0519 17:36:03.487702 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue $ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller | grep 10.0.128.1 I0519 17:38:47.333425 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" E0519 17:38:48.980315 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0519 17:41:32.832625 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" I0519 17:41:34.008509 1 controller.go:160] Dropping key '10.0.128.1' from the cloud-private-ip-config workqueue $ oc delete egressips.k8s.ovn.org egressip1 egressip.k8s.ovn.org "egressip1" deleted $ oc get egressips.k8s.ovn.org No resources found $ oc get cloudprivateipconfig NAME AGE 10.0.128.1 2m25s It seems once there is error in cloud-network-config-controller about 10.0.128.1, even after egressip object is deleted, cloudprivateipconfig is not cleaned up.
Verified with pre-merged image 4.11.0-0.ci.test-2022-06-21-191124-ci-ln-zy5x7wb-latest built with openshift/ovn-kubernetes#1114 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.ci.test-2022-06-21-191124-ci-ln-zy5x7wb-latest True False 10m Cluster version is 4.11.0-0.ci.test-2022-06-21-191124-ci-ln-zy5x7wb-latest $ oc get node NAME STATUS ROLES AGE VERSION jechen-0621b-7jwwv-master-0.c.openshift-qe.internal Ready master 28m v1.24.0+284d62a jechen-0621b-7jwwv-master-1.c.openshift-qe.internal Ready master 28m v1.24.0+284d62a jechen-0621b-7jwwv-master-2.c.openshift-qe.internal Ready master 28m v1.24.0+284d62a jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal Ready worker 18m v1.24.0+284d62a jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal Ready worker 18m v1.24.0+284d62a jechen-0621b-7jwwv-worker-c-9kttc.c.openshift-qe.internal Ready worker 18m v1.24.0+284d62a $ oc label node jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal labeled $ cat config_egressip1_ovn_ns_team_red.yaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressip1 spec: egressIPs: - 10.0.128.1 - 10.0.128.5 namespaceSelector: matchLabels: team: red $ oc create -f config_egressip1_ovn_ns_team_red.yaml egressip.k8s.ovn.org/egressip1 created $ oc get egressips.k8s.ovn.org NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.1 $ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller I0621 21:25:52.748540 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal" E0621 21:25:53.994707 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:25:54.001155 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal" E0621 21:25:55.130798 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:25:55.282148 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal" E0621 21:25:56.448478 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:26:01.577361 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal" E0621 21:26:02.808717 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:26:13.057495 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal" E0621 21:26:14.391725 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue $ oc delete egressips.k8s.ovn.org egressip1 egressip.k8s.ovn.org "egressip1" deleted $ oc get egressips.k8s.ovn.org No resources found $ oc get cloudprivateipconfig No resources found # mark a second node as egress-assignable $ oc label node jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal labeled $ oc create -f config_egressip1_ovn_ns_team_red.yaml egressip.k8s.ovn.org/egressip1 created $ oc get egressips.k8s.ovn.org NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.1 jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal 10.0.128.5 $ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller | grep 10.0.128.1 I0621 21:28:32.351812 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal" E0621 21:28:34.048118 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:28:34.054633 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal" E0621 21:28:35.804242 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:28:36.615143 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal" E0621 21:28:38.212123 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:28:48.463551 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal" E0621 21:28:49.552451 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:29:10.050868 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal" E0621 21:29:11.275126 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue $ oc delete egressips.k8s.ovn.org egressip1 egressip.k8s.ovn.org "egressip1" deleted $ oc get egressips.k8s.ovn.org No resources found $ oc get cloudprivateipconfig No resources found ==> for failed egressIP assignment, failed cloudprivateipconfig is deleted by cloud-network-config-controller correctly
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069