Bug 2026461

Summary: Completed pods in Openshift cluster not releasing IP addresses and results in err: range is full unless manually deleted
Product: OpenShift Container Platform Reporter: Mayur Deore <mdeore>
Component: NetworkingAssignee: Tim Rozet <trozet>
Networking sub component: ovn-kubernetes QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: ableisch, agogala, alchan, anusaxen, ben.taljaard, bpickard, bsmitley, calfonso, cgaynor, danw, fgleizes, jkaur, me, rbrattai, satripat, skudupud, swasthan, trozet, vpickard, zzhao
Version: 4.8   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2091157 (view as bug list) Environment:
Last Closed: 2022-08-10 10:39:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2091157    

Description Mayur Deore 2021-11-24 17:35:02 UTC
Description of problem:
Completed pods are not releasing or reusing IP address that was allocated when the pod was running. 

Version-Release number of selected component (if applicable):
RHOCP 4.9 with OVNKubernetes.

How reproducible:
- Spawned many completed pods to fill the host subnet on the specific node.
- When Subnet gets full, We get the following error.

Actual results:
- New pods can't be created with the following error:
   Warning  ErrorAddingLogicalPort  3m32s (x52 over 53m)  control plane  failed to assign pod addresses for pod aaa_pod453 on node: master, err: the range is full

Expected results:
- Completed pods will not require an IP address for any IP of Network communication.
- New pod must use the IP addresses from completed pods and run without any error.

Additional info:
This issue is not present with OpenshiftSDN as it is able to reuse IP from completed pods

Comment 14 zhaozhanqi 2022-04-21 03:39:35 UTC
sounds like I can reproduce this issue on 4.10 version 4.10.0-0.nightly-2022-04-19-145842

1. set the node max pods to 520

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: set-max-pods
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: large-pods
  kubeletConfig:
    maxPods: 520

2. Create about 500 pods one the node with RC

{
  "apiVersion": "v1",
  "kind": "List",
  "items": [
    {
      "apiVersion": "v1",
      "kind": "ReplicationController",
      "metadata": {
        "labels": {
          "name": "max-pods"
        },
        "name": "max-pods"
      },
      "spec": {
        "replicas": 500,
        "template": {
          "metadata": {
            "labels":  {
              "name": "max-pods"
            }
          },
          "spec": {
            "containers": [
              {
              "name": "max-pod",
                "image": "quay.io/openshifttest/nonexist"
              }
            ],
          "nodeName": "node-name"
          }
        }
      }
    }
  ]

3. there will a lot pods with 'OutOfpods'

max-pods-zxz2w   0/1     OutOfpods           0          95m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zz5hr   0/1     OutOfpods           0          61m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zz86q   0/1     OutOfpods           0          105m   <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzdhg   0/1     OutOfpods           0          106m   <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzftf   0/1     OutOfpods           0          99m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzgl2   0/1     OutOfpods           0          100m   <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzjlv   0/1     OutOfpods           0          52m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzlpj   0/1     OutOfpods           0          82m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzm9h   0/1     OutOfpods           0          83m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzn7h   0/1     OutOfpods           0          97m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzpvk   0/1     OutOfpods           0          107m   <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzqgr   0/1     OutOfpods           0          94m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzqrc   0/1     OutOfpods           0          97m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzsfv   0/1     OutOfpods           0          108m   <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzzsb   0/1     OutOfpods           0          64m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>

$ oc get pod | grep OutOfpods | wc -l
9785

 
$ oc get pod | grep -v OutOfpods 
NAME             READY   STATUS              RESTARTS   AGE
max-pods-4bg4d   0/1     ContainerCreating   0          13m


$ oc describe pod max-pods-4bg4d
Name:           max-pods-4bg4d
Namespace:      g3ami
Priority:       0
Node:           openshift-qe-028.lab.eng.rdu2.redhat.com/10.8.1.181
Start Time:     Wed, 20 Apr 2022 20:17:07 +0800
Labels:         name=max-pods
Annotations:    openshift.io/scc: restricted
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicationController/max-pods
Containers:
  max-pod:
    Container ID:   
    Image:          quay.io/openshifttest/nonexist
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rnfdc (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-rnfdc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age   From     Message
  ----     ------                  ----  ----     -------
  Warning  FailedCreatePodSandBox  12m   kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(ae15d81c680c0b3e319ceeff8b95f43c4070cdb9cc9ae541b7b94909e0188f40): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d ae15d81c680c0b3e319ceeff8b95f43c4070cdb9cc9ae541b7b94909e0188f40] [g3ami/max-pods-4bg4d ae15d81c680c0b3e319ceeff8b95f43c4070cdb9cc9ae541b7b94909e0188f40] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
  Warning  FailedCreatePodSandBox  10m  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(db8ef8efcf1468884f7dd2a8410dc12e2b4da4611f825f420aa35ff3d1f15f81): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d db8ef8efcf1468884f7dd2a8410dc12e2b4da4611f825f420aa35ff3d1f15f81] [g3ami/max-pods-4bg4d db8ef8efcf1468884f7dd2a8410dc12e2b4da4611f825f420aa35ff3d1f15f81] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
  Warning  FailedCreatePodSandBox  8m5s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(e3b947b7b737042a536a145a8cef00c06fce83157646c24e7e033aa99872aba9): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d e3b947b7b737042a536a145a8cef00c06fce83157646c24e7e033aa99872aba9] [g3ami/max-pods-4bg4d e3b947b7b737042a536a145a8cef00c06fce83157646c24e7e033aa99872aba9] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
  Warning  ErrorAddingLogicalPort  6m30s (x8 over 14m)  controlplane  failed to assign pod addresses for pod g3ami_max-pods-4bg4d on node: openshift-qe-028.lab.eng.rdu2.redhat.com, err: range is full
  Warning  FailedCreatePodSandBox  5m52s                kubelet       Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(0f22b1be69addc700684a01eb0f2070d11fc173941c52851e2aea4f44157aa3d): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d 0f22b1be69addc700684a01eb0f2070d11fc173941c52851e2aea4f44157aa3d] [g3ami/max-pods-4bg4d 0f22b1be69addc700684a01eb0f2070d11fc173941c52851e2aea4f44157aa3d] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
  Warning  FailedCreatePodSandBox  3m40s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(deb1b60387e543427ef37ab36144908675d36e77f10f517b067a17e4bc6f3bf4): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d deb1b60387e543427ef37ab36144908675d36e77f10f517b067a17e4bc6f3bf4] [g3ami/max-pods-4bg4d deb1b60387e543427ef37ab36144908675d36e77f10f517b067a17e4bc6f3bf4] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
  Warning  ErrorAddingLogicalPort  119s  controlplane  failed to assign pod addresses for pod g3ami_max-pods-4bg4d on node: openshift-qe-028.lab.eng.rdu2.redhat.com, err: range is full
  Warning  FailedCreatePodSandBox  86s   kubelet       Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(6dd85a4bc9e003b2baabae7a772b9774aab941234a5701c95a9dc58a99ae20db): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d 6dd85a4bc9e003b2baabae7a772b9774aab941234a5701c95a9dc58a99ae20db] [g3ami/max-pods-4bg4d 6dd85a4bc9e003b2baabae7a772b9774aab941234a5701c95a9dc58a99ae20db] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded

Comment 16 zhaozhanqi 2022-04-21 07:21:03 UTC
Verified this bug on 4.11.0-0.nightly-2022-04-15-153812


1.  there are some completed pod in openshift-operator-lifecycle-manager 

openshift-operator-lifecycle-manager               collect-profiles-27508695-prrwb                                           0/1     Completed               0                 38m     10.131.1.162   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
openshift-operator-lifecycle-manager               collect-profiles-27508710-2nllh                                           0/1     Completed               0                 23m     10.131.1.25    openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
openshift-operator-lifecycle-manager               collect-profiles-27508725-67gcz                                           0/1     Completed               0                 8m45s   10.131.1.39    openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>

2. after updated the max pods to 520 in node

3. Then apply pod yaml,  then scale up 100->150>200>300>400>493

{
  "apiVersion": "v1",
  "kind": "List",
  "items": [
    {
      "apiVersion": "v1",
      "kind": "ReplicationController",
      "metadata": {
        "labels": {
          "name": "max-pods"
        },
        "name": "max-pods"
      },
      "spec": {
        "replicas": 50,
        "template": {
          "metadata": {
            "labels":  {
              "name": "max-pods"
            }
          },
          "spec": {
            "containers": [
              {
              "command": [ "/bin/true" ],
              "name": "max-pod",
                "image": "quay.io/openshifttest/hello-sdn@sha256:2af5b5ec480f05fda7e9b278023ba04724a3dd53a296afcd8c13f220dec52197"
              }
            ],
          "nodeName": "openshift-qe-028.lab.eng.rdu2.redhat.com"
          }
        }
      }
    }
  ]
}

5. and  make 510 ips all used

$ oc get pod -A -o wide | grep 10.131 | wc -l
510

6. Then create one normal test pod on node

oc get pod -n z1 -o wide
z1                                                 test-rc-xwvm8                                                             1/1     Running                 0                 13m     10.131.1.39    openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>

We can see the pod ip is same with step 1


$ oc get pod -A -o wide | grep openshift-qe-028.lab.eng.rdu2.redhat.com | grep 10.131.1.39
openshift-operator-lifecycle-manager               collect-profiles-27508725-67gcz                                           0/1     Completed               0                 31m     10.131.1.39    openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
z1                                                 test-rc-xwvm8                                                             1/1     Running                 0                 13m     10.131.1.39    openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>

7.  Check the pod is working well

$ oc rsh -n z1 test-rc-mmqgt
~ $ curl 10.131.1.39:8080
Hello OpenShift!
~ $ 

Move this to verified.

Comment 17 zhaozhanqi 2022-04-21 07:28:20 UTC
@trozet Please ignore my question in comment 15.  ip will be released after deleting pods

Comment 24 Ben Taljaard 2022-06-03 10:56:05 UTC
Are there plans to backport this to 4.10?

Comment 33 errata-xmlrpc 2022-08-10 10:39:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 35 Red Hat Bugzilla 2023-09-18 04:28:29 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days