Bug 2073378
Summary: | failed egressIP assignment - cloud-network-config-controller does not delete failed cloudprivateipconfig | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Andreas Karis <akaris> |
Component: | Networking | Assignee: | Periyasamy Palanisamy <pepalani> |
Networking sub component: | ovn-kubernetes | QA Contact: | jechen <jechen> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | low | CC: | ffernand, jechen, pepalani |
Version: | 4.11 | ||
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 11:05:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Andreas Karis
2022-04-08 11:14:37 UTC
@pepalani I am not sure if this bug is fixed. out of three worker nodes, if I only mark one node egress-assignable to true. And I create egressIP object with 10.0.128.1 and 10.0.128.5 as egressip. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-05-18-171831 True False 100m Cluster version is 4.11.0-0.nightly-2022-05-18-171831 $ oc get node NAME STATUS ROLES AGE VERSION jechen-0519c-9ng2l-master-0.c.openshift-qe.internal Ready master 73m v1.23.3+69213f8 jechen-0519c-9ng2l-master-1.c.openshift-qe.internal Ready master 73m v1.23.3+69213f8 jechen-0519c-9ng2l-master-2.c.openshift-qe.internal Ready master 73m v1.23.3+69213f8 jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal Ready worker 64m v1.23.3+69213f8 jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal Ready worker 64m v1.23.3+69213f8 jechen-0519c-9ng2l-worker-c-hctm8.c.openshift-qe.internal Ready worker 64m v1.23.3+69213f8 $ oc label node jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal labeled $ cat config_egressip1_ovn_ns_team_red.yaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressip1 spec: egressIPs: - 10.0.128.1 - 10.0.128.5 namespaceSelector: matchLabels: team: red $ oc get egressips.k8s.ovn.org NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.1 jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal 10.0.128.5 # there is no error for 10.0.128.1 in cloud-network-config-controller $ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller | grep 10.0.128.1 $ $ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller I0519 16:31:35.906740 1 node_controller.go:106] Setting annotation: 'cloud.network.openshift.io/egress-ipconfig: [{"interface":"nic0","ifaddr":{"ipv4":"10.0.128.0/17"},"capacity":{"ip":10}}]' on node: jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal I0519 16:31:35.925541 1 controller.go:160] Dropping key 'jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal' from the node workqueue I0519 16:31:36.028513 1 controller.go:160] Dropping key 'jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal' from the node workqueue I0519 17:32:23.706146 1 controller.go:182] Assigning key: 10.0.128.5 to cloud-private-ip-config workqueue I0519 17:32:23.711977 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.5" will be added to node: "jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal" I0519 17:32:23.725078 1 controller.go:182] Assigning key: 10.0.128.5 to cloud-private-ip-config workqueue I0519 17:32:23.725541 1 cloudprivateipconfig_controller.go:295] Adding finalizer to CloudPrivateIPConfig: "10.0.128.5" I0519 17:32:26.598123 1 cloudprivateipconfig_controller.go:353] Added IP address to node: "jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal" for CloudPrivateIPConfig: "10.0.128.5" I0519 17:32:26.607988 1 controller.go:160] Dropping key '10.0.128.5' from the cloud-private-ip-config workqueue I0519 17:32:26.612368 1 controller.go:160] Dropping key '10.0.128.5' from the cloud-private-ip-config workqueue $ oc delete egressips.k8s.ovn.org egressip1 egressip.k8s.ovn.org "egressip1" deleted $ oc get egressips.k8s.ovn.org No resources found $ oc get cloudprivateipconfig No resources found However, if make a second node egress-assignable true $ oc label node jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal labeled # create the egressip object back $ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml egressip.k8s.ovn.org/egressip1 created $ oc get egressips.k8s.ovn.org NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.1 jechen-0519c-9ng2l-worker-a-sh8zv.c.openshift-qe.internal 10.0.128.5 $ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller I0519 17:35:57.747989 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" E0519 17:35:58.999335 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0519 17:35:59.004272 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" E0519 17:36:00.129698 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0519 17:36:00.135605 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" E0519 17:36:01.179277 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0519 17:36:01.184189 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" E0519 17:36:02.370066 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0519 17:36:02.471125 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" E0519 17:36:03.487702 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue $ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller | grep 10.0.128.1 I0519 17:38:47.333425 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" E0519 17:38:48.980315 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0519 17:41:32.832625 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0519c-9ng2l-worker-b-gvqf6.c.openshift-qe.internal" I0519 17:41:34.008509 1 controller.go:160] Dropping key '10.0.128.1' from the cloud-private-ip-config workqueue $ oc delete egressips.k8s.ovn.org egressip1 egressip.k8s.ovn.org "egressip1" deleted $ oc get egressips.k8s.ovn.org No resources found $ oc get cloudprivateipconfig NAME AGE 10.0.128.1 2m25s It seems once there is error in cloud-network-config-controller about 10.0.128.1, even after egressip object is deleted, cloudprivateipconfig is not cleaned up. Verified with pre-merged image 4.11.0-0.ci.test-2022-06-21-191124-ci-ln-zy5x7wb-latest built with openshift/ovn-kubernetes#1114 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.ci.test-2022-06-21-191124-ci-ln-zy5x7wb-latest True False 10m Cluster version is 4.11.0-0.ci.test-2022-06-21-191124-ci-ln-zy5x7wb-latest $ oc get node NAME STATUS ROLES AGE VERSION jechen-0621b-7jwwv-master-0.c.openshift-qe.internal Ready master 28m v1.24.0+284d62a jechen-0621b-7jwwv-master-1.c.openshift-qe.internal Ready master 28m v1.24.0+284d62a jechen-0621b-7jwwv-master-2.c.openshift-qe.internal Ready master 28m v1.24.0+284d62a jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal Ready worker 18m v1.24.0+284d62a jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal Ready worker 18m v1.24.0+284d62a jechen-0621b-7jwwv-worker-c-9kttc.c.openshift-qe.internal Ready worker 18m v1.24.0+284d62a $ oc label node jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal labeled $ cat config_egressip1_ovn_ns_team_red.yaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressip1 spec: egressIPs: - 10.0.128.1 - 10.0.128.5 namespaceSelector: matchLabels: team: red $ oc create -f config_egressip1_ovn_ns_team_red.yaml egressip.k8s.ovn.org/egressip1 created $ oc get egressips.k8s.ovn.org NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.1 $ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller I0621 21:25:52.748540 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal" E0621 21:25:53.994707 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:25:54.001155 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal" E0621 21:25:55.130798 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:25:55.282148 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal" E0621 21:25:56.448478 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:26:01.577361 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal" E0621 21:26:02.808717 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:26:13.057495 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal" E0621 21:26:14.391725 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue $ oc delete egressips.k8s.ovn.org egressip1 egressip.k8s.ovn.org "egressip1" deleted $ oc get egressips.k8s.ovn.org No resources found $ oc get cloudprivateipconfig No resources found # mark a second node as egress-assignable $ oc label node jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal labeled $ oc create -f config_egressip1_ovn_ns_team_red.yaml egressip.k8s.ovn.org/egressip1 created $ oc get egressips.k8s.ovn.org NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.1 jechen-0621b-7jwwv-worker-a-lrszj.c.openshift-qe.internal 10.0.128.5 $ oc logs -n openshift-cloud-network-config-controller -l app=cloud-network-config-controller | grep 10.0.128.1 I0621 21:28:32.351812 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal" E0621 21:28:34.048118 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:28:34.054633 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal" E0621 21:28:35.804242 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:28:36.615143 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal" E0621 21:28:38.212123 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:28:48.463551 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal" E0621 21:28:49.552451 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue I0621 21:29:10.050868 1 cloudprivateipconfig_controller.go:271] CloudPrivateIPConfig: "10.0.128.1" will be added to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal" E0621 21:29:11.275126 1 controller.go:165] error syncing '10.0.128.1': error assigning CloudPrivateIPConfig: "10.0.128.1" to node: "jechen-0621b-7jwwv-worker-b-jd27w.c.openshift-qe.internal", err: {"errors":[{"code":"IP_IN_USE_BY_ANOTHER_RESOURCE","message":"IP '10.0.128.1/32' is already being used by another resource. "}]}, requeuing in cloud-private-ip-config workqueue $ oc delete egressips.k8s.ovn.org egressip1 egressip.k8s.ovn.org "egressip1" deleted $ oc get egressips.k8s.ovn.org No resources found $ oc get cloudprivateipconfig No resources found ==> for failed egressIP assignment, failed cloudprivateipconfig is deleted by cloud-network-config-controller correctly Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |