Bug 1735538

Summary: Pods stuck in container creating - Failed to run CNI IPAM ADD: failed to allocate for range 0
Product: OpenShift Container Platform Reporter: Itzik Brown <itbrown>
Component: NetworkingAssignee: Alexander Constantinescu <aconstan>
Status: CLOSED ERRATA QA Contact: zhaozhanqi <zzhao>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: aconstan, akrzos, aos-bugs, cdc, eparis, nagrawal, veer, weliang
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1743587 (view as bug list) Environment:
Last Closed: 2019-10-16 06:34:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1743587    

Description Itzik Brown 2019-08-01 04:48:53 UTC
Description of problem:
After running some tests Pods are stuck in CcontainerCreating.
I see the following error when running oc describe:
  Warning  FailedCreatePodSandBox  89s (x2174 over 11h)  kubelet, master-1  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kubevirt-hyperconverged-cluster-jobbrw79-jczfc_kubevirt-hyperconverged_8482d21d-b3b3-11e9-bf45-00b6291ae442_0(ff39fece449a153d88bf540602a749d69fb8141ecb642d2445c234fca3c8feec): Multus: Err adding pod to network "openshift-sdn": Multus: error in invoke Delegate add - "openshift-sdn": CNI request failed with status 400: 'failed to run IPAM for ff39fece449a153d88bf540602a749d69fb8141ecb642d2445c234fca3c8feec: failed to run CNI IPAM ADD: failed to allocate for range 0: no IP addresses available in range set: 10.129.0.1-10.129.1.254

When listing all addresses of Containers starting with 10.129 I get only 34.

On the node I get the following:
[core@master-1 openshift-sdn]$ ls -l /var/lib/cni/networks/openshift-sdn/10.129.* |wc -l
509

Version-Release number of selected component (if applicable):
$ oc version
Client Version: version.Info{Major:"4", Minor:"2+", GitVersion:"v4.2.0", GitCommit:"2e9d4a117", GitTreeState:"clean", BuildDate:"2019-07-28T17:15:26Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0+743bc2c", GitCommit:"743bc2c", GitTreeState:"clean", BuildDate:"2019-07-21T21:17:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}


How reproducible:


Steps to Reproduce:
1. Create and delete many pods (more that the addresses you have)
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Casey Callendrello 2019-08-06 12:16:27 UTC
Jacob, please work with Itzik to reproduce this issue.

We've had several reports of problems with this. This is very urgent. I would argue it's a release blocker.

Comment 2 Veer Muchandi 2019-08-07 21:13:01 UTC
This is frequently coming up on my cluster. Even reboot of the node doesn't help.

I have captured CRIO logs here https://pastebin.com/y2jga9Vw

I have to manually clean up stale IPs at /var/lib/cni/networks/openshift-sdn/

Comment 5 Weibin Liang 2019-08-19 19:44:15 UTC
Tested and verified in v4.2.0-0.ci-2019-08-19-054234:

1. Create many pods (more that the addresses you have), all pods got created and use all ip addresses.
2. Delete those pods, check pods again and both pods and pods' ip addresses got deleted.
3. Recreate many pods (more that the addresses you have), all pods got created and use all ip addresses.
4. Re delete those pods, check pods again and both pods and pods' ip addresses got deleted.

Comment 6 Casey Callendrello 2019-08-26 17:26:01 UTC
*** Bug 1688955 has been marked as a duplicate of this bug. ***

Comment 7 errata-xmlrpc 2019-10-16 06:34:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922