Bug 2026461 - Completed pods in Openshift cluster not releasing IP addresses and results in err: range is full unless manually deleted
Summary: Completed pods in Openshift cluster not releasing IP addresses and results in...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: ---
: 4.11.0
Assignee: Tim Rozet
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 2091157
TreeView+ depends on / blocked
 
Reported: 2021-11-24 17:35 UTC by Mayur Deore
Modified: 2023-09-18 04:28 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2091157 (view as bug list)
Environment:
Last Closed: 2022-08-10 10:39:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 1010 0 None Merged [DownstreamMerge] Bug 2026461: 4-4-22 merge 2022-07-18 21:55:16 UTC
Github ovn-org ovn-kubernetes pull 1121 0 None Merged Scalability: Delete logical ports for completed pods 2022-07-18 21:55:26 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:40:25 UTC

Description Mayur Deore 2021-11-24 17:35:02 UTC
Description of problem:
Completed pods are not releasing or reusing IP address that was allocated when the pod was running. 

Version-Release number of selected component (if applicable):
RHOCP 4.9 with OVNKubernetes.

How reproducible:
- Spawned many completed pods to fill the host subnet on the specific node.
- When Subnet gets full, We get the following error.

Actual results:
- New pods can't be created with the following error:
   Warning  ErrorAddingLogicalPort  3m32s (x52 over 53m)  control plane  failed to assign pod addresses for pod aaa_pod453 on node: master, err: the range is full

Expected results:
- Completed pods will not require an IP address for any IP of Network communication.
- New pod must use the IP addresses from completed pods and run without any error.

Additional info:
This issue is not present with OpenshiftSDN as it is able to reuse IP from completed pods

Comment 14 zhaozhanqi 2022-04-21 03:39:35 UTC
sounds like I can reproduce this issue on 4.10 version 4.10.0-0.nightly-2022-04-19-145842

1. set the node max pods to 520

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: set-max-pods
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: large-pods
  kubeletConfig:
    maxPods: 520

2. Create about 500 pods one the node with RC

{
  "apiVersion": "v1",
  "kind": "List",
  "items": [
    {
      "apiVersion": "v1",
      "kind": "ReplicationController",
      "metadata": {
        "labels": {
          "name": "max-pods"
        },
        "name": "max-pods"
      },
      "spec": {
        "replicas": 500,
        "template": {
          "metadata": {
            "labels":  {
              "name": "max-pods"
            }
          },
          "spec": {
            "containers": [
              {
              "name": "max-pod",
                "image": "quay.io/openshifttest/nonexist"
              }
            ],
          "nodeName": "node-name"
          }
        }
      }
    }
  ]

3. there will a lot pods with 'OutOfpods'

max-pods-zxz2w   0/1     OutOfpods           0          95m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zz5hr   0/1     OutOfpods           0          61m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zz86q   0/1     OutOfpods           0          105m   <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzdhg   0/1     OutOfpods           0          106m   <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzftf   0/1     OutOfpods           0          99m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzgl2   0/1     OutOfpods           0          100m   <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzjlv   0/1     OutOfpods           0          52m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzlpj   0/1     OutOfpods           0          82m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzm9h   0/1     OutOfpods           0          83m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzn7h   0/1     OutOfpods           0          97m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzpvk   0/1     OutOfpods           0          107m   <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzqgr   0/1     OutOfpods           0          94m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzqrc   0/1     OutOfpods           0          97m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzsfv   0/1     OutOfpods           0          108m   <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
max-pods-zzzsb   0/1     OutOfpods           0          64m    <none>   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>

$ oc get pod | grep OutOfpods | wc -l
9785

 
$ oc get pod | grep -v OutOfpods 
NAME             READY   STATUS              RESTARTS   AGE
max-pods-4bg4d   0/1     ContainerCreating   0          13m


$ oc describe pod max-pods-4bg4d
Name:           max-pods-4bg4d
Namespace:      g3ami
Priority:       0
Node:           openshift-qe-028.lab.eng.rdu2.redhat.com/10.8.1.181
Start Time:     Wed, 20 Apr 2022 20:17:07 +0800
Labels:         name=max-pods
Annotations:    openshift.io/scc: restricted
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicationController/max-pods
Containers:
  max-pod:
    Container ID:   
    Image:          quay.io/openshifttest/nonexist
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rnfdc (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-rnfdc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age   From     Message
  ----     ------                  ----  ----     -------
  Warning  FailedCreatePodSandBox  12m   kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(ae15d81c680c0b3e319ceeff8b95f43c4070cdb9cc9ae541b7b94909e0188f40): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d ae15d81c680c0b3e319ceeff8b95f43c4070cdb9cc9ae541b7b94909e0188f40] [g3ami/max-pods-4bg4d ae15d81c680c0b3e319ceeff8b95f43c4070cdb9cc9ae541b7b94909e0188f40] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
  Warning  FailedCreatePodSandBox  10m  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(db8ef8efcf1468884f7dd2a8410dc12e2b4da4611f825f420aa35ff3d1f15f81): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d db8ef8efcf1468884f7dd2a8410dc12e2b4da4611f825f420aa35ff3d1f15f81] [g3ami/max-pods-4bg4d db8ef8efcf1468884f7dd2a8410dc12e2b4da4611f825f420aa35ff3d1f15f81] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
  Warning  FailedCreatePodSandBox  8m5s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(e3b947b7b737042a536a145a8cef00c06fce83157646c24e7e033aa99872aba9): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d e3b947b7b737042a536a145a8cef00c06fce83157646c24e7e033aa99872aba9] [g3ami/max-pods-4bg4d e3b947b7b737042a536a145a8cef00c06fce83157646c24e7e033aa99872aba9] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
  Warning  ErrorAddingLogicalPort  6m30s (x8 over 14m)  controlplane  failed to assign pod addresses for pod g3ami_max-pods-4bg4d on node: openshift-qe-028.lab.eng.rdu2.redhat.com, err: range is full
  Warning  FailedCreatePodSandBox  5m52s                kubelet       Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(0f22b1be69addc700684a01eb0f2070d11fc173941c52851e2aea4f44157aa3d): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d 0f22b1be69addc700684a01eb0f2070d11fc173941c52851e2aea4f44157aa3d] [g3ami/max-pods-4bg4d 0f22b1be69addc700684a01eb0f2070d11fc173941c52851e2aea4f44157aa3d] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
  Warning  FailedCreatePodSandBox  3m40s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(deb1b60387e543427ef37ab36144908675d36e77f10f517b067a17e4bc6f3bf4): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d deb1b60387e543427ef37ab36144908675d36e77f10f517b067a17e4bc6f3bf4] [g3ami/max-pods-4bg4d deb1b60387e543427ef37ab36144908675d36e77f10f517b067a17e4bc6f3bf4] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
  Warning  ErrorAddingLogicalPort  119s  controlplane  failed to assign pod addresses for pod g3ami_max-pods-4bg4d on node: openshift-qe-028.lab.eng.rdu2.redhat.com, err: range is full
  Warning  FailedCreatePodSandBox  86s   kubelet       Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(6dd85a4bc9e003b2baabae7a772b9774aab941234a5701c95a9dc58a99ae20db): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d 6dd85a4bc9e003b2baabae7a772b9774aab941234a5701c95a9dc58a99ae20db] [g3ami/max-pods-4bg4d 6dd85a4bc9e003b2baabae7a772b9774aab941234a5701c95a9dc58a99ae20db] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded

Comment 16 zhaozhanqi 2022-04-21 07:21:03 UTC
Verified this bug on 4.11.0-0.nightly-2022-04-15-153812


1.  there are some completed pod in openshift-operator-lifecycle-manager 

openshift-operator-lifecycle-manager               collect-profiles-27508695-prrwb                                           0/1     Completed               0                 38m     10.131.1.162   openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
openshift-operator-lifecycle-manager               collect-profiles-27508710-2nllh                                           0/1     Completed               0                 23m     10.131.1.25    openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
openshift-operator-lifecycle-manager               collect-profiles-27508725-67gcz                                           0/1     Completed               0                 8m45s   10.131.1.39    openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>

2. after updated the max pods to 520 in node

3. Then apply pod yaml,  then scale up 100->150>200>300>400>493

{
  "apiVersion": "v1",
  "kind": "List",
  "items": [
    {
      "apiVersion": "v1",
      "kind": "ReplicationController",
      "metadata": {
        "labels": {
          "name": "max-pods"
        },
        "name": "max-pods"
      },
      "spec": {
        "replicas": 50,
        "template": {
          "metadata": {
            "labels":  {
              "name": "max-pods"
            }
          },
          "spec": {
            "containers": [
              {
              "command": [ "/bin/true" ],
              "name": "max-pod",
                "image": "quay.io/openshifttest/hello-sdn@sha256:2af5b5ec480f05fda7e9b278023ba04724a3dd53a296afcd8c13f220dec52197"
              }
            ],
          "nodeName": "openshift-qe-028.lab.eng.rdu2.redhat.com"
          }
        }
      }
    }
  ]
}

5. and  make 510 ips all used

$ oc get pod -A -o wide | grep 10.131 | wc -l
510

6. Then create one normal test pod on node

oc get pod -n z1 -o wide
z1                                                 test-rc-xwvm8                                                             1/1     Running                 0                 13m     10.131.1.39    openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>

We can see the pod ip is same with step 1


$ oc get pod -A -o wide | grep openshift-qe-028.lab.eng.rdu2.redhat.com | grep 10.131.1.39
openshift-operator-lifecycle-manager               collect-profiles-27508725-67gcz                                           0/1     Completed               0                 31m     10.131.1.39    openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>
z1                                                 test-rc-xwvm8                                                             1/1     Running                 0                 13m     10.131.1.39    openshift-qe-028.lab.eng.rdu2.redhat.com   <none>           <none>

7.  Check the pod is working well

$ oc rsh -n z1 test-rc-mmqgt
~ $ curl 10.131.1.39:8080
Hello OpenShift!
~ $ 

Move this to verified.

Comment 17 zhaozhanqi 2022-04-21 07:28:20 UTC
@trozet Please ignore my question in comment 15.  ip will be released after deleting pods

Comment 24 Ben Taljaard 2022-06-03 10:56:05 UTC
Are there plans to backport this to 4.10?

Comment 33 errata-xmlrpc 2022-08-10 10:39:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 35 Red Hat Bugzilla 2023-09-18 04:28:29 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.