Description of problem: Tested on AWS SDN cluster, after add/remove EgressIPs from the namespace many times, cloud-network-config-controller pod was CrashLoopBackOff Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2022-01-09-195852 How reproducible: Not sure Steps to Reproduce: 1. Before cloud-network-config-controller pod crash, I observed some unused cloudprivateipconfigs was not removed. Here in the environment, like 10.0.51.100 and 10.0.67.21. Then I re-add egressip 10.0.51.100 to the hostsubnet and netnamespace, however, this 10.0.51.100 was always in incorrect status. $ oc get cloudprivateipconfigs NAME AGE 10.0.51.100 7h31m 10.0.67.21 37m 10.0.73.100 7m33s $ oc get cloudprivateipconfigs 10.0.51.100 -o yaml apiVersion: cloud.network.openshift.io/v1 kind: CloudPrivateIPConfig metadata: creationTimestamp: "2022-01-10T01:52:09Z" deletionGracePeriodSeconds: 0 deletionTimestamp: "2022-01-10T02:14:17Z" finalizers: - cloudprivateipconfig.cloud.network.openshift.io/finalizer generation: 2 name: 10.0.51.100 resourceVersion: "44970" uid: 77aa9143-2e1c-4976-ab26-fa759866f14c spec: node: ip-10-0-51-186.us-east-2.compute.internal status: conditions: - lastTransitionTime: "2022-01-10T02:14:17Z" message: Deleting IP address observedGeneration: 2 reason: CloudResponsePending status: Unknown type: Assigned node: "" 2. Then try to reproduce this issue. Patch 3 egressips to different host and create a new namespace test3 , patch the 3 egressips to test3, then remove all the egressips from test3. $ oc get hostsubnet NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS ip-10-0-51-186.us-east-2.compute.internal ip-10-0-51-186.us-east-2.compute.internal 10.0.51.186 10.129.0.0/23 ["10.0.51.100"] ip-10-0-57-103.us-east-2.compute.internal ip-10-0-57-103.us-east-2.compute.internal 10.0.57.103 10.129.2.0/23 ["10.0.57.50"] ip-10-0-57-202.us-east-2.compute.internal ip-10-0-57-202.us-east-2.compute.internal 10.0.57.202 10.128.2.0/23 ip-10-0-67-247.us-east-2.compute.internal ip-10-0-67-247.us-east-2.compute.internal 10.0.67.247 10.128.0.0/23 ["10.0.67.50"] ip-10-0-71-99.us-east-2.compute.internal ip-10-0-71-99.us-east-2.compute.internal 10.0.71.99 10.131.0.0/23 [] ip-10-0-73-87.us-east-2.compute.internal ip-10-0-73-87.us-east-2.compute.internal 10.0.73.87 10.130.0.0/23 ["10.0.73.50"] oc patch netnamespace test3 --type=merge -p '{"egressIPs": ["10.0.73.50","10.0.57.50","10.0.67.50"]}' netnamespace.network.openshift.io/test3 patched $ oc get cloudprivateipconfigs NAME AGE 10.0.51.100 7h30m 10.0.57.50 6s 10.0.67.21 36m 10.0.67.50 6s 10.0.73.100 7m33s 10.0.73.50 6s $ oc patch netnamespace test3 --type=merge -p '{"egressIPs": []}' netnamespace.network.openshift.io/test3 patched 10.0.67.50 and 10.0.57.50 was left. $ oc get cloudprivateipconfigs NAME AGE 10.0.51.100 7h44m 10.0.57.50 13m 10.0.67.21 50m 10.0.67.50 13m 10.0.73.100 21m oc get pods -n openshift-cloud-network-config-controller NAME READY STATUS RESTARTS AGE cloud-network-config-controller-6999cd7db-l8bjh 0/1 CrashLoopBackOff 29 (2m48s ago) 8h $ oc logs cloud-network-config-controller-6999cd7db-l8bjh -n openshift-cloud-network-config-controller W0110 09:22:46.563873 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0110 09:22:46.564492 1 leaderelection.go:248] attempting to acquire leader lease openshift-cloud-network-config-controller/cloud-network-config-controller-lock... I0110 09:22:46.578052 1 leaderelection.go:258] successfully acquired lease openshift-cloud-network-config-controller/cloud-network-config-controller-lock I0110 09:22:46.578575 1 controller.go:88] Starting node controller I0110 09:22:46.578585 1 controller.go:91] Waiting for informer caches to sync for node workqueue I0110 09:22:46.578622 1 controller.go:88] Starting cloud-private-ip-config controller I0110 09:22:46.578669 1 controller.go:91] Waiting for informer caches to sync for cloud-private-ip-config workqueue I0110 09:22:46.578623 1 controller.go:88] Starting secret controller I0110 09:22:46.578720 1 controller.go:91] Waiting for informer caches to sync for secret workqueue I0110 09:22:46.581035 1 controller.go:182] Assigning key: 10.0.67.21 to cloud-private-ip-config workqueue I0110 09:22:46.581054 1 controller.go:182] Assigning key: 10.0.73.100 to cloud-private-ip-config workqueue I0110 09:22:46.581060 1 controller.go:182] Assigning key: 10.0.51.100 to cloud-private-ip-config workqueue I0110 09:22:46.583295 1 controller.go:182] Assigning key: ip-10-0-51-186.us-east-2.compute.internal to node workqueue I0110 09:22:46.583311 1 controller.go:182] Assigning key: ip-10-0-57-103.us-east-2.compute.internal to node workqueue I0110 09:22:46.583316 1 controller.go:182] Assigning key: ip-10-0-57-202.us-east-2.compute.internal to node workqueue I0110 09:22:46.583318 1 controller.go:182] Assigning key: ip-10-0-67-247.us-east-2.compute.internal to node workqueue I0110 09:22:46.583322 1 controller.go:182] Assigning key: ip-10-0-71-99.us-east-2.compute.internal to node workqueue I0110 09:22:46.583325 1 controller.go:182] Assigning key: ip-10-0-73-87.us-east-2.compute.internal to node workqueue I0110 09:22:46.678714 1 controller.go:96] Starting node workers I0110 09:22:46.678747 1 controller.go:102] Started node workers I0110 09:22:46.678783 1 controller.go:160] Dropping key 'ip-10-0-51-186.us-east-2.compute.internal' from the node workqueue I0110 09:22:46.678788 1 controller.go:160] Dropping key 'ip-10-0-57-103.us-east-2.compute.internal' from the node workqueue I0110 09:22:46.678803 1 controller.go:160] Dropping key 'ip-10-0-67-247.us-east-2.compute.internal' from the node workqueue I0110 09:22:46.678806 1 controller.go:160] Dropping key 'ip-10-0-71-99.us-east-2.compute.internal' from the node workqueue I0110 09:22:46.678809 1 controller.go:160] Dropping key 'ip-10-0-73-87.us-east-2.compute.internal' from the node workqueue I0110 09:22:46.678818 1 controller.go:160] Dropping key 'ip-10-0-57-202.us-east-2.compute.internal' from the node workqueue I0110 09:22:46.678842 1 controller.go:96] Starting secret workers I0110 09:22:46.678852 1 controller.go:102] Started secret workers I0110 09:22:46.678859 1 controller.go:96] Starting cloud-private-ip-config workers I0110 09:22:46.678877 1 controller.go:102] Started cloud-private-ip-config workers I0110 09:22:46.681301 1 controller.go:160] Dropping key '10.0.51.100' from the cloud-private-ip-config workqueue I0110 09:22:46.681302 1 controller.go:160] Dropping key '10.0.73.100' from the cloud-private-ip-config workqueue I0110 09:22:46.682331 1 controller.go:160] Dropping key '10.0.67.21' from the cloud-private-ip-config workqueue I0110 09:22:56.395401 1 controller.go:182] Assigning key: 10.0.57.50 to cloud-private-ip-config workqueue I0110 09:22:56.395446 1 controller.go:182] Assigning key: 10.0.67.50 to cloud-private-ip-config workqueue I0110 09:22:56.398059 1 cloudprivateipconfig_controller.go:257] CloudPrivateIPConfig: "10.0.67.50" will be added to node: "ip-10-0-67-247.us-east-2.compute.internal" I0110 09:22:56.398304 1 cloudprivateipconfig_controller.go:257] CloudPrivateIPConfig: "10.0.57.50" will be added to node: "ip-10-0-57-103.us-east-2.compute.internal" I0110 09:22:56.399266 1 controller.go:182] Assigning key: 10.0.73.50 to cloud-private-ip-config workqueue I0110 09:22:56.401326 1 cloudprivateipconfig_controller.go:257] CloudPrivateIPConfig: "10.0.73.50" will be added to node: "ip-10-0-73-87.us-east-2.compute.internal" I0110 09:22:56.406525 1 cloudprivateipconfig_controller.go:281] Adding finalizer to CloudPrivateIPConfig: "10.0.67.50" I0110 09:22:56.408671 1 cloudprivateipconfig_controller.go:281] Adding finalizer to CloudPrivateIPConfig: "10.0.57.50" I0110 09:22:56.408884 1 cloudprivateipconfig_controller.go:281] Adding finalizer to CloudPrivateIPConfig: "10.0.73.50" I0110 09:22:57.607979 1 cloudprivateipconfig_controller.go:338] Added IP address to node: "ip-10-0-67-247.us-east-2.compute.internal" for CloudPrivateIPConfig: "10.0.67.50" I0110 09:22:57.738806 1 cloudprivateipconfig_controller.go:338] Added IP address to node: "ip-10-0-57-103.us-east-2.compute.internal" for CloudPrivateIPConfig: "10.0.57.50" I0110 09:22:57.799929 1 controller.go:160] Dropping key '10.0.67.50' from the cloud-private-ip-config workqueue I0110 09:22:58.002283 1 cloudprivateipconfig_controller.go:338] Added IP address to node: "ip-10-0-73-87.us-east-2.compute.internal" for CloudPrivateIPConfig: "10.0.73.50" I0110 09:22:58.202889 1 controller.go:160] Dropping key '10.0.57.50' from the cloud-private-ip-config workqueue I0110 09:22:58.601868 1 controller.go:160] Dropping key '10.0.73.50' from the cloud-private-ip-config workqueue I0110 09:23:20.643913 1 controller.go:182] Assigning key: 10.0.73.50 to cloud-private-ip-config workqueue I0110 09:23:20.702755 1 controller.go:182] Assigning key: 10.0.57.50 to cloud-private-ip-config workqueue I0110 09:23:20.702780 1 controller.go:182] Assigning key: 10.0.67.50 to cloud-private-ip-config workqueue I0110 09:23:20.703333 1 cloudprivateipconfig_controller.go:174] CloudPrivateIPConfig: "10.0.73.50" will be deleted from node: "ip-10-0-73-87.us-east-2.compute.internal" I0110 09:23:20.704984 1 cloudprivateipconfig_controller.go:174] CloudPrivateIPConfig: "10.0.57.50" will be deleted from node: "ip-10-0-57-103.us-east-2.compute.internal" I0110 09:23:20.706223 1 cloudprivateipconfig_controller.go:174] CloudPrivateIPConfig: "10.0.67.50" will be deleted from node: "ip-10-0-67-247.us-east-2.compute.internal" I0110 09:23:20.711438 1 controller.go:182] Assigning key: 10.0.73.50 to cloud-private-ip-config workqueue I0110 09:23:20.711461 1 controller.go:182] Assigning key: 10.0.73.50 to cloud-private-ip-config workqueue I0110 09:23:20.712551 1 controller.go:182] Assigning key: 10.0.57.50 to cloud-private-ip-config workqueue I0110 09:23:20.712600 1 controller.go:182] Assigning key: 10.0.57.50 to cloud-private-ip-config workqueue I0110 09:23:20.715140 1 controller.go:182] Assigning key: 10.0.67.50 to cloud-private-ip-config workqueue I0110 09:23:20.715158 1 controller.go:182] Assigning key: 10.0.67.50 to cloud-private-ip-config workqueue I0110 09:23:21.214815 1 cloudprivateipconfig_controller.go:228] CloudPrivateIPConfig: 10.0.73.50 object has been marked for complete deletion I0110 09:23:21.214833 1 cloudprivateipconfig_controller.go:235] Cleaning up IP address and finalizer for CloudPrivateIPConfig: "10.0.73.50", deleting it completely I0110 09:23:21.224238 1 controller.go:160] Dropping key '10.0.73.50' from the cloud-private-ip-config workqueue I0110 09:23:21.224288 1 controller.go:182] Assigning key: 10.0.73.50 to cloud-private-ip-config workqueue I0110 09:23:21.225985 1 cloudprivateipconfig_controller.go:405] CloudPrivateIPConfig: "10.0.73.50" in work queue no longer exists E0110 09:23:21.226056 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 130 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1f7dca0, 0x3913df0}) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x7d k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000973e10}) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75 panic({0x1f7dca0, 0x3913df0}) /usr/lib/golang/src/runtime/panic.go:1038 +0x215 github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig.(*CloudPrivateIPConfigController).SyncHandler(0xc0002a6580, {0xc00059a916, 0xa}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig/cloudprivateipconfig_controller.go:165 +0x57 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc0002a51a0, {0x1db1440, 0xc000973e10}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x126 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc0002a51a0) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(...) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f66fdf6dff8) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x26b4260, 0xc0004c2f60}, 0x1, 0xc00009c120) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x0) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0x0, 0x0) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25 created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x398 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1b744b7] goroutine 130 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000973e10}) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0xd8 panic({0x1f7dca0, 0x3913df0}) /usr/lib/golang/src/runtime/panic.go:1038 +0x215 github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig.(*CloudPrivateIPConfigController).SyncHandler(0xc0002a6580, {0xc00059a916, 0xa}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig/cloudprivateipconfig_controller.go:165 +0x57 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc0002a51a0, {0x1db1440, 0xc000973e10}) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x126 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc0002a51a0) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46 github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(...) /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f66fdf6dff8) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x26b4260, 0xc0004c2f60}, 0x1, 0xc00009c120) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x0) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0x0, 0x0) /go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25 created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run /go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x398 Actual results Expected results: cloud-network-config-controller should not crash and corresponding cloudprivateipconfigs ips was removed if they were removed from namespace. Additional info:
*** This bug has been marked as a duplicate of bug 2034144 ***
Re-opening, the bug I referenced is caused partially due to the same problem, but having this bug explicitly capture this bug only makes things clearer.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056