Hide Forgot
Description of problem: After running some tests Pods are stuck in CcontainerCreating. I see the following error when running oc describe: Warning FailedCreatePodSandBox 89s (x2174 over 11h) kubelet, master-1 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kubevirt-hyperconverged-cluster-jobbrw79-jczfc_kubevirt-hyperconverged_8482d21d-b3b3-11e9-bf45-00b6291ae442_0(ff39fece449a153d88bf540602a749d69fb8141ecb642d2445c234fca3c8feec): Multus: Err adding pod to network "openshift-sdn": Multus: error in invoke Delegate add - "openshift-sdn": CNI request failed with status 400: 'failed to run IPAM for ff39fece449a153d88bf540602a749d69fb8141ecb642d2445c234fca3c8feec: failed to run CNI IPAM ADD: failed to allocate for range 0: no IP addresses available in range set: 10.129.0.1-10.129.1.254 When listing all addresses of Containers starting with 10.129 I get only 34. On the node I get the following: [core@master-1 openshift-sdn]$ ls -l /var/lib/cni/networks/openshift-sdn/10.129.* |wc -l 509 Version-Release number of selected component (if applicable): $ oc version Client Version: version.Info{Major:"4", Minor:"2+", GitVersion:"v4.2.0", GitCommit:"2e9d4a117", GitTreeState:"clean", BuildDate:"2019-07-28T17:15:26Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0+743bc2c", GitCommit:"743bc2c", GitTreeState:"clean", BuildDate:"2019-07-21T21:17:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"} How reproducible: Steps to Reproduce: 1. Create and delete many pods (more that the addresses you have) 2. 3. Actual results: Expected results: Additional info:
Jacob, please work with Itzik to reproduce this issue. We've had several reports of problems with this. This is very urgent. I would argue it's a release blocker.
This is frequently coming up on my cluster. Even reboot of the node doesn't help. I have captured CRIO logs here https://pastebin.com/y2jga9Vw I have to manually clean up stale IPs at /var/lib/cni/networks/openshift-sdn/
Tested and verified in v4.2.0-0.ci-2019-08-19-054234: 1. Create many pods (more that the addresses you have), all pods got created and use all ip addresses. 2. Delete those pods, check pods again and both pods and pods' ip addresses got deleted. 3. Recreate many pods (more that the addresses you have), all pods got created and use all ip addresses. 4. Re delete those pods, check pods again and both pods and pods' ip addresses got deleted.
*** Bug 1688955 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922