2035167 – [cloud-network-config-controller] unable to deleted cloudprivateipconfig when deleting

Bug 2035167 - [cloud-network-config-controller] unable to deleted cloudprivateipconfig when deleting

Summary: [cloud-network-config-controller] unable to deleted cloudprivateipconfig when...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Ben Bennett
QA Contact:	huirwang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-12-23 06:49 UTC by huirwang
Modified:	2022-03-10 16:36 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-10 16:36:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cloud-network-config-controller pull 17	None	open	Bug 2035167: fix cloudprivateipconfig sync on delete	2022-01-17 11:05:07 UTC
Github	openshift cloud-network-config-controller pull 19	None	open	Bug 2035167: Revert to using `UpdateStatus`	2022-01-21 11:30:07 UTC
Red Hat Product Errata	RHSA-2022:0056	None	None	None	2022-03-10 16:36:34 UTC

Description huirwang 2021-12-23 06:49:52 UTC

Description of problem:
[SDN AWS]sdn-controller crashed after re-configure EgressIPs

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2021-12-21-130047

How reproducible:


Steps to Reproduce:
1. Patch two nodes as EgressCIDRs nodes
2. Create 20 namespaces
for i in {1..10};do oc create ns p$i;sleep 1;done
namespace/p1 created
namespace/p2 created
namespace/p3 created
namespace/p4 created
namespace/p5 created
namespace/p6 created
namespace/p7 created
namespace/p8 created
namespace/p9 created
namespace/p10 created
for i in {11..20};do oc create ns p$i;sleep 1;done
namespace/p11 created
namespace/p12 created
namespace/p13 created
namespace/p14 created
namespace/p15 created
namespace/p16 created
namespace/p17 created
namespace/p18 created
namespace/p19 created
namespace/p20 created

 Patch one egressip to each namespace
for i in {1..10};do oc patch netnamespace p$i  -p "{\"egressIPs\":[\"10.0.59.$i\"]}"  --type=merge ;sleep 1;done
netnamespace.network.openshift.io/p1 patched (no change)
netnamespace.network.openshift.io/p2 patched
netnamespace.network.openshift.io/p3 patched
netnamespace.network.openshift.io/p4 patched
netnamespace.network.openshift.io/p5 patched
netnamespace.network.openshift.io/p6 patched
netnamespace.network.openshift.io/p7 patched
netnamespace.network.openshift.io/p8 patched
netnamespace.network.openshift.io/p9 patched
netnamespace.network.openshift.io/p10 patched
$ for i in {11..20};do oc patch netnamespace p$i  -p "{\"egressIPs\":[\"10.0.59.$i\"]}"  --type=merge ;sleep 1;done
netnamespace.network.openshift.io/p11 patched
netnamespace.network.openshift.io/p12 patched
netnamespace.network.openshift.io/p13 patched
netnamespace.network.openshift.io/p14 patched
netnamespace.network.openshift.io/p15 patched
netnamespace.network.openshift.io/p16 patched
netnamespace.network.openshift.io/p17 patched
netnamespace.network.openshift.io/p18 patched
netnamespace.network.openshift.io/p19 patched
netnamespace.network.openshift.io/p20 patched

3. Check hostsubnet , each node was applied 9 IPs as the IP capacity for each node is 9.
oc get hostsubnet
NAME                                        HOST                                        HOST IP       SUBNET          EGRESS CIDRS       EGRESS IPS
ip-10-0-50-111.us-east-2.compute.internal   ip-10-0-50-111.us-east-2.compute.internal   10.0.50.111   10.131.0.0/23   ["10.0.48.0/20"]   ["10.0.59.13","10.0.59.7","10.0.59.11","10.0.59.100","10.0.59.1","10.0.59.5","10.0.59.15","10.0.59.9","10.0.59.17"]
ip-10-0-52-192.us-east-2.compute.internal   ip-10-0-52-192.us-east-2.compute.internal   10.0.52.192   10.130.0.0/23                      
ip-10-0-57-18.us-east-2.compute.internal    ip-10-0-57-18.us-east-2.compute.internal    10.0.57.18    10.129.0.0/23                      
ip-10-0-59-247.us-east-2.compute.internal   ip-10-0-59-247.us-east-2.compute.internal   10.0.59.247   10.129.2.0/23   ["10.0.48.0/20"]   ["10.0.59.12","10.0.59.10","10.0.59.14","10.0.59.2","10.0.59.3","10.0.59.4","10.0.59.6","10.0.59.8","10.0.59.16"]
ip-10-0-65-70.us-east-2.compute.internal    ip-10-0-65-70.us-east-2.compute.internal    10.0.65.70    10.128.0.0/23                      
ip-10-0-77-149.us-east-2.compute.internal   ip-10-0-77-149.us-east-2.compute.internal   10.0.77.149   10.128.2.0/23


5. Delete all above namespace and repeat step 2


Actual results:

Found only a few EgressIPs configured.

oc get hostsubnet
NAME                                        HOST                                        HOST IP       SUBNET          EGRESS CIDRS       EGRESS IPS
ip-10-0-50-111.us-east-2.compute.internal   ip-10-0-50-111.us-east-2.compute.internal   10.0.50.111   10.131.0.0/23   ["10.0.48.0/20"]   ["10.0.59.2","10.0.59.100","10.0.59.4"]
ip-10-0-52-192.us-east-2.compute.internal   ip-10-0-52-192.us-east-2.compute.internal   10.0.52.192   10.130.0.0/23                      
ip-10-0-57-18.us-east-2.compute.internal    ip-10-0-57-18.us-east-2.compute.internal    10.0.57.18    10.129.0.0/23                      
ip-10-0-59-247.us-east-2.compute.internal   ip-10-0-59-247.us-east-2.compute.internal   10.0.59.247   10.129.2.0/23   ["10.0.48.0/20"]   ["10.0.59.1","10.0.59.21","10.0.59.3"]
ip-10-0-65-70.us-east-2.compute.internal    ip-10-0-65-70.us-east-2.compute.internal    10.0.65.70    10.128.0.0/23                      
ip-10-0-77-149.us-east-2.compute.internal   ip-10-0-77-149.us-east-2.compute.internal   10.0.77.149   10.128.2.0/23    

And one sdn-controller was CrashLoopBackOff

$ oc get pods -n openshift-sdn -o wide
NAME                   READY   STATUS             RESTARTS      AGE     IP            NODE                                        NOMINATED NODE   READINESS GATES
sdn-bscb4              2/2     Running            0             4h54m   10.0.57.18    ip-10-0-57-18.us-east-2.compute.internal    <none>           <none>
sdn-controller-949mj   1/1     Running            0             4h54m   10.0.52.192   ip-10-0-52-192.us-east-2.compute.internal   <none>           <none>
sdn-controller-cg8gd   0/1     CrashLoopBackOff   5 (41s ago)   4h54m   10.0.57.18    ip-10-0-57-18.us-east-2.compute.internal    <none>           <none>
sdn-controller-fpvfg   1/1     Running            0             4h54m   10.0.65.70    ip-10-0-65-70.us-east-2.compute.internal    <none>           <none>
sdn-hcvcr              2/2     Running            0             4h41m   10.0.59.247   ip-10-0-59-247.us-east-2.compute.internal   <none>           <none>
sdn-k8kk9              2/2     Running            0             4h54m   10.0.52.192   ip-10-0-52-192.us-east-2.compute.internal   <none>           <none>
sdn-mck5t              2/2     Running            0             4h46m   10.0.77.149   ip-10-0-77-149.us-east-2.compute.internal   <none>           <none>
sdn-tpb68              2/2     Running            0             4h46m   10.0.50.111   ip-10-0-50-111.us-east-2.compute.internal   <none>           <none>
sdn-ztlm8              2/2     Running            0             4h54m   10.0.65.70    ip-10-0-65-70.us-east-2.compute.internal    <none>           <none>

$ oc describe pod sdn-controller-cg8gd -n openshift-sdn 
Name:                 sdn-controller-cg8gd
Namespace:            openshift-sdn
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 ip-10-0-57-18.us-east-2.compute.internal/10.0.57.18
Start Time:           Thu, 23 Dec 2021 09:36:59 +0800
Labels:               app=sdn-controller
                      controller-revision-hash=58747c9748
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   10.0.57.18
IPs:
  IP:           10.0.57.18
Controlled By:  DaemonSet/sdn-controller
Containers:
  sdn-controller:
    Container ID:  cri-o://16369d3d891e93bc037ce46d868d64734eaf0375ef99298f98c7d777f90eaff8
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:39c6b94542b3297130a345d82db7620e79950ee58b6c692d3c933fe426f2e0de
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:39c6b94542b3297130a345d82db7620e79950ee58b6c692d3c933fe426f2e0de
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      if [[ -f /env/_master ]]; then
        set -o allexport
        source /env/_master
        set +o allexport
      fi
      
      exec openshift-sdn-controller \
       --platform-type AWS \
       --v=${OPENSHIFT_SDN_LOG_LEVEL:-2}
      
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   0600d20, 0x30)
                 github.com/openshift/sdn/pkg/network/common/egressip.go:625 +0x471
github.com/openshift/sdn/pkg/network/common.(*EgressIPTracker).syncEgressIPs(0xc000600460)
  github.com/openshift/sdn/pkg/network/common/egressip.go:600 +0xeb
github.com/openshift/sdn/pkg/network/common.(*EgressIPTracker).UpdateHostSubnetEgress(0xc000600460, 0xc0004aa6f0)
  github.com/openshift/sdn/pkg/network/common/egressip.go:373 +0xab0
github.com/openshift/sdn/pkg/network/common.(*EgressIPTracker).handleAddOrUpdateHostSubnet(0xc000600460, {0x195b4a0, 0xc0004aa6f0}, {0x100000000000000, 0x0}, {0x19841b8, 0x5})
  github.com/openshift/sdn/pkg/network/common/egressip.go:254 +0x6fa
github.com/openshift/sdn/pkg/network/common.InformerFuncs.func1({0x195b4a0, 0xc0004aa6f0})
  github.com/openshift/sdn/pkg/network/common/informers.go:19 +0x39
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
  k8s.io/client-go.0-rc.0/tools/cache/controller.go:231
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
  k8s.io/client-go.0-rc.0/tools/cache/shared_informer.go:777 +0x9f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f8a88061da0)
  k8s.io/apimachinery.0-rc.0/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000147738, {0x1ba1cc0, 0xc0000ae240}, 0x1, 0xc0005ec8a0)
  k8s.io/apimachinery.0-rc.0/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000138000, 0x3b9aca00, 0x0, 0xc0, 0xc0001477b0)
  k8s.io/apimachinery.0-rc.0/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
  k8s.io/apimachinery.0-rc.0/pkg/util/wait/wait.go:90
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000478680)
  k8s.io/client-go.0-rc.0/tools/cache/shared_informer.go:771 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
  k8s.io/apimachinery.0-rc.0/pkg/util/wait/wait.go:73 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
  k8s.io/apimachinery.0-rc.0/pkg/util/wait/wait.go:71 +0x88

      Exit Code:    2
      Started:      Thu, 23 Dec 2021 14:31:14 +0800
      Finished:     Thu, 23 Dec 2021 14:31:14 +0800
    Ready:          False
    Restart Count:  5
    Requests:
      cpu:     10m
      memory:  50Mi
    Environment:
      KUBERNETES_SERVICE_PORT:  6443
      KUBERNETES_SERVICE_HOST:  api-int.huirwang-23a.qe.devcluster.openshift.com
    Mounts:
      /env from env-overrides (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gnjdw (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  env-overrides:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      env-overrides
    Optional:  true
  kube-api-access-gnjdw:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              node-role.kubernetes.io/master=
Tolerations:                 node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason   Age                    From     Message
  ----     ------   ----                   ----     -------
  Normal   Created  2m25s (x5 over 4h55m)  kubelet  Created container sdn-controller
  Normal   Started  2m25s (x5 over 4h55m)  kubelet  Started container sdn-controller
  Warning  BackOff  78s (x12 over 3m45s)   kubelet  Back-off restarting failed container
  Normal   Pulled   63s (x5 over 3m46s)    kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:39c6b94542b3297130a345d82db7620e79950ee58b6c692d3c933fe426f2e0de" already present on machine


Expected results:
No sdn-controler crashing and re-configure EgressIP working.

Additional info:

Comment 21 errata-xmlrpc 2022-03-10 16:36:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.