Description of problem: My customer has the following issue: On two nodes for our OpenShift cluster when number of pods exceed something around 100 running pods new pods stay at status Pending or ContainerCreating. There is sufficient memory and cpu available to schedule pods. This does not happen on other nodes in this cluster. OpenShift details: OpenShift Version: 3.11.317 Number of masters: 3 Number of infra nodes: 2 Number of workers nodes: 18 Host detail: Hostname: XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX Physical server Model: PowerEdge R740xd Atomic Version : 7.9.1 This influence our customers they are unable to schedule production loads. as Workaround we changed max-pods number to 100 in /etc/origin/node/node-config.yaml on those two nodes. Version-Release number of selected component (if applicable): OCP 3.11 How reproducible: 100% but only at the customers environment Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
You can't directly change the size of a host portion of the IP range assigned to the SDN. However, you can add a new pod IP range and give it a larger host allocation and then delete and add a node to move it to the new range. Take a look at: https://docs.openshift.com/container-platform/3.11/install_config/configuring_sdn.html#configuring-the-pod-network-on-masters
verified with version: v3.11.487 keep more than 100 running pods per node, and when create new pods, they also become running status.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 3.11.487 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2928