Bug 1930100 - When number of pods exceed something around 100 running pods new pods stay at status Pending or ContainerCreating.
Summary: When number of pods exceed something around 100 running pods new pods stay at...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.11.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 3.11.z
Assignee: Ryan Phillips
QA Contact: MinLi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-18 11:13 UTC by Andy Bartlett
Modified: 2021-11-12 14:53 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-12 14:53:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26237 0 None open [release-3.11] Bug 1930100: Pod status patches 2021-06-21 21:30:37 UTC
Red Hat Product Errata RHBA-2021:2928 0 None None None 2021-08-04 11:18:31 UTC

Description Andy Bartlett 2021-02-18 11:13:23 UTC
Description of problem:
My customer has the following issue:

On two nodes for our OpenShift cluster when number of pods exceed something around 100 running pods new pods  stay at status Pending or ContainerCreating.

There is sufficient memory and cpu available to schedule pods.
This does not happen on other nodes in this cluster.

OpenShift details:
    OpenShift Version: 3.11.317
    Number of masters: 3
    Number of infra nodes: 2
    Number of workers nodes: 18

Host detail:
   Hostname:
       XXX.XXX.XXX.XXX
       XXX.XXX.XXX.XXX
   Physical server
   Model: PowerEdge R740xd
   Atomic Version : 7.9.1

This influence  our customers they are unable to schedule production loads.
as Workaround we changed max-pods number to 100 in /etc/origin/node/node-config.yaml on those two nodes.


Version-Release number of selected component (if applicable):

OCP 3.11

How reproducible:
100% but only at the customers environment


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 16 Ben Bennett 2021-03-08 16:36:11 UTC
You can't directly change the size of a host portion of the IP range assigned to the SDN.  However, you can add a new pod IP range and give it a larger host allocation and then delete and add a node to move it to the new range.

Take a look at:
 https://docs.openshift.com/container-platform/3.11/install_config/configuring_sdn.html#configuring-the-pod-network-on-masters

Comment 27 MinLi 2021-07-31 09:01:05 UTC
verified with version: v3.11.487

keep more than 100 running pods per node, and when create new pods, they also become running status.

Comment 29 errata-xmlrpc 2021-08-04 11:18:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.487 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2928


Note You need to log in before you can comment on or make changes to this bug.