Bug 2081842

Summary: Pod stuck in container creating - failed to run CNI IPAM ADD: failed to allocate for range 0: no IP addresses available in range set
Product: OpenShift Container Platform Reporter: Sanjay Tripathi <satripat>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED NOTABUG Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs
Version: 4.7   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-04 19:54:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sanjay Tripathi 2022-05-04 19:31:33 UTC
Description of problem:
The customer is facing the issue described in the KCS article[1].

Below are the events logs from the customer environment:
~~~
11s        Warning  FailedCreatePodSandBox  pod/startup-test-app-5                       (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_startup-test-app-5_cto-xxxxxxxx-osetest-170600_65869ab1-1202-4300-8f52-73909d12fada_0(1efab1b72bc631bef54bd1b41470cda02431195204b6e3333cef3e7adebac7d6): error adding pod cto-citicloud-osetest-170600_startup-test-app-5 to CNI network "multus-cni-network": [cto-xxxxxxxx-osetest-170600/startup-test-app-5:openshift-sdn]: error adding container to network "openshift-sdn": CNI request failed with status 400: 'failed to run IPAM for 1efab1b72bc631bef54bd1b41470cda02431195204b6e3333cef3e7adebac7d6: failed to run CNI IPAM ADD: failed to allocate for range 0: no IP addresses available in range set: 10.xxx.x.x-10.xxx.x.xxx
~~~

As per the KCS article, the issue was fixed in OCP version 4.7.11 however the customer is facing the issue in OCP version 4.7.45.

[1] https://access.redhat.com/solutions/5943591

The workaround mentioned in the KCS helped to resolve the issue temporarily. After some time the issue reappears and the customer is observing the issue more often now on random nodes.

Hence, Customer is asking for a permanent fix asap since this affects their whole cluster setup.


How reproducible:
NA

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info: