Bug 1735538 - Pods stuck in container creating - Failed to run CNI IPAM ADD: failed to allocate for range 0
Summary: Pods stuck in container creating - Failed to run CNI IPAM ADD: failed to allo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.2.0
Assignee: Alexander Constantinescu
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1688955 (view as bug list)
Depends On:
Blocks: 1743587
TreeView+ depends on / blocked
 
Reported: 2019-08-01 04:48 UTC by Itzik Brown
Modified: 2019-10-16 06:34 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1743587 (view as bug list)
Environment:
Last Closed: 2019-10-16 06:34:15 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 291 0 'None' closed Bug 1735538: Adding cni-version to multus daemonset yaml 2021-02-15 11:07:05 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:34:31 UTC

Description Itzik Brown 2019-08-01 04:48:53 UTC
Description of problem:
After running some tests Pods are stuck in CcontainerCreating.
I see the following error when running oc describe:
  Warning  FailedCreatePodSandBox  89s (x2174 over 11h)  kubelet, master-1  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kubevirt-hyperconverged-cluster-jobbrw79-jczfc_kubevirt-hyperconverged_8482d21d-b3b3-11e9-bf45-00b6291ae442_0(ff39fece449a153d88bf540602a749d69fb8141ecb642d2445c234fca3c8feec): Multus: Err adding pod to network "openshift-sdn": Multus: error in invoke Delegate add - "openshift-sdn": CNI request failed with status 400: 'failed to run IPAM for ff39fece449a153d88bf540602a749d69fb8141ecb642d2445c234fca3c8feec: failed to run CNI IPAM ADD: failed to allocate for range 0: no IP addresses available in range set: 10.129.0.1-10.129.1.254

When listing all addresses of Containers starting with 10.129 I get only 34.

On the node I get the following:
[core@master-1 openshift-sdn]$ ls -l /var/lib/cni/networks/openshift-sdn/10.129.* |wc -l
509

Version-Release number of selected component (if applicable):
$ oc version
Client Version: version.Info{Major:"4", Minor:"2+", GitVersion:"v4.2.0", GitCommit:"2e9d4a117", GitTreeState:"clean", BuildDate:"2019-07-28T17:15:26Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0+743bc2c", GitCommit:"743bc2c", GitTreeState:"clean", BuildDate:"2019-07-21T21:17:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}


How reproducible:


Steps to Reproduce:
1. Create and delete many pods (more that the addresses you have)
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Casey Callendrello 2019-08-06 12:16:27 UTC
Jacob, please work with Itzik to reproduce this issue.

We've had several reports of problems with this. This is very urgent. I would argue it's a release blocker.

Comment 2 Veer Muchandi 2019-08-07 21:13:01 UTC
This is frequently coming up on my cluster. Even reboot of the node doesn't help.

I have captured CRIO logs here https://pastebin.com/y2jga9Vw

I have to manually clean up stale IPs at /var/lib/cni/networks/openshift-sdn/

Comment 5 Weibin Liang 2019-08-19 19:44:15 UTC
Tested and verified in v4.2.0-0.ci-2019-08-19-054234:

1. Create many pods (more that the addresses you have), all pods got created and use all ip addresses.
2. Delete those pods, check pods again and both pods and pods' ip addresses got deleted.
3. Recreate many pods (more that the addresses you have), all pods got created and use all ip addresses.
4. Re delete those pods, check pods again and both pods and pods' ip addresses got deleted.

Comment 6 Casey Callendrello 2019-08-26 17:26:01 UTC
*** Bug 1688955 has been marked as a duplicate of this bug. ***

Comment 7 errata-xmlrpc 2019-10-16 06:34:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.