Description of problem:
New ManagementCPUsOverride admission plugin blocks pod creation in clusters with no nodes
Version-Release number of selected component (if applicable):
4.8.0-fc.2 and later
Steps to Reproduce:
1. Create a Hypershift cluster with no nodes
2. Observe that pods can not be created
FailedCreate replicaset/network-operator-6d876489f9 Error creating: pods "network-operator-6d876489f9-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride the cluster does not have any nodes
The pods should be created and wait in Pending state
When a control plane is provided external to the nodes in the cluster, it is possible to create a cluster with a control plane but no nodes. While such a cluster is not able to actually run pods, a user should be able to create pods such that they will be scheduled as soon as a Node becomes available.
With the new behavior the Pod, typically created by a ReplicaSet, is rejected at admission time. The ReplicaSet controller does not expect this and does generic backoff, not understanding the cause of the rejection. When Nodes become available, depending on how long the backoff has been occurring, it can be many minutes before the ReplicaSet attempts to create the Pods again.
With the old behavior the Pods could be created and placed in a Pending (i.e. unscheduled) state. However, the kube-scheduler watches for Nodes to be added and immediately schedules pods when Nodes become present.
This PR introduced the regression
Current thinking is we should exempt any topology that doesn't explicitly need this admission plugin from the Pod blocking admission policy.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.