All payload components should request a reasonable minimum CPU and p90 memory usage and avoid limits on scaling components. https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#resources-and-limits https://github.com/openshift/machine-api-operator/pull/825 added a number of limits which require exceptions, and in several cases those limits fail the "only components that have completely deterministic workload regardless of cluster scale may set limits on memory and CPU" rule. For now, all the limits should be removed and reintroduced later with approval for the specific workload containers that can justify its use. Referenced from the new e2e test which gates components without resource requests and enforces the resource conventions.
*** Bug 1938580 has been marked as a duplicate of this bug. ***
Verified installation on Power(ppc64le). No errors seen. # oc version Client Version: 4.8.0-0.nightly-ppc64le-2021-03-18-074956 Server Version: 4.8.0-0.nightly-ppc64le-2021-03-18-074956 Kubernetes Version: v1.20.0+e1bc274 # oc get co machine-api NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE machine-api 4.8.0-0.nightly-ppc64le-2021-03-18-074956 True False False 92m # oc get pods -A | grep machine-api openshift-machine-api cluster-autoscaler-operator-57748cbb-95pct 2/2 Running 0 99m openshift-machine-api cluster-baremetal-operator-6b5466c885-qs92l 2/2 Running 0 99m openshift-machine-api machine-api-operator-6889c85fbc-bg8rd 2/2 Running 0 99m # oc describe pod machine-api-operator-6889c85fbc-bg8rd -n openshift-machine-api Name: machine-api-operator-6889c85fbc-bg8rd Namespace: openshift-machine-api Priority: 2000001000 Priority Class Name: system-node-critical Node: master-2/192.168.26.251 Start Time: Thu, 18 Mar 2021 04:51:54 -0400 Labels: k8s-app=machine-api-operator pod-template-hash=6889c85fbc Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "", "interface": "eth0", "ips": [ "10.130.0.7" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "interface": "eth0", "ips": [ "10.130.0.7" ], "default": true, "dns": {} }] openshift.io/scc: restricted Status: Running IP: 10.130.0.7 IPs: IP: 10.130.0.7 Controlled By: ReplicaSet/machine-api-operator-6889c85fbc Containers: kube-rbac-proxy: Container ID: cri-o://039cb1791a45939d1595be2093495aa44636fa86cb62e9dd7fe2bfe05644cbbf Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:feabd53e4be03a277277f08412bd8ea0a3caf0c63c3276fd301d2409647b4fb7 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:feabd53e4be03a277277f08412bd8ea0a3caf0c63c3276fd301d2409647b4fb7 Port: 8443/TCP Host Port: 0/TCP Args: --secure-listen-address=0.0.0.0:8443 --upstream=http://localhost:8080/ --tls-cert-file=/etc/tls/private/tls.crt --tls-private-key-file=/etc/tls/private/tls.key --config-file=/etc/kube-rbac-proxy/config-file.yaml --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 --logtostderr=true --v=3 State: Running Started: Thu, 18 Mar 2021 04:53:01 -0400 Ready: True Restart Count: 0 Requests: cpu: 10m memory: 20Mi Environment: <none> Mounts: /etc/kube-rbac-proxy from config (rw) /etc/tls/private from machine-api-operator-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from machine-api-operator-token-ps8tq (ro) machine-api-operator: Container ID: cri-o://eccfa36b5831afe2dc252b0649f125f2d07614ab7916265797d5f9df3c07c274 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:959e83ba3b9024f9cc06f13f10ef70fee5cebd9c773469878e9820c72c7a2efe Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:959e83ba3b9024f9cc06f13f10ef70fee5cebd9c773469878e9820c72c7a2efe Port: <none> Host Port: <none> Command: /machine-api-operator Args: start --images-json=/etc/machine-api-operator-config/images/images.json --alsologtostderr --v=3 State: Running Started: Thu, 18 Mar 2021 04:53:16 -0400 Ready: True Restart Count: 0 Requests: cpu: 10m memory: 50Mi Environment: RELEASE_VERSION: 4.8.0-0.nightly-ppc64le-2021-03-18-074956 COMPONENT_NAMESPACE: openshift-machine-api (v1:metadata.namespace) METRICS_PORT: 8080 Mounts: /etc/machine-api-operator-config/images from images (rw) /var/run/secrets/kubernetes.io/serviceaccount from machine-api-operator-token-ps8tq (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: config: Type: ConfigMap (a volume populated by a ConfigMap) Name: kube-rbac-proxy Optional: false images: Type: ConfigMap (a volume populated by a ConfigMap) Name: machine-api-operator-images Optional: false machine-api-operator-tls: Type: Secret (a volume populated by a Secret) SecretName: machine-api-operator-tls Optional: false machine-api-operator-token-ps8tq: Type: Secret (a volume populated by a Secret) SecretName: machine-api-operator-token-ps8tq Optional: false QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 120s node.kubernetes.io/unreachable:NoExecute op=Exists for 120s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 100m default-scheduler no nodes available to schedule pods Warning FailedScheduling 100m default-scheduler no nodes available to schedule pods Warning FailedScheduling 94m default-scheduler 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. Warning FailedScheduling 94m default-scheduler 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. Normal Scheduled 93m default-scheduler Successfully assigned openshift-machine-api/machine-api-operator-6889c85fbc-bg8rd to master-2 Warning FailedMount 92m (x7 over 93m) kubelet MountVolume.SetUp failed for volume "machine-api-operator-tls" : secret "machine-api-operator-tls" not found Normal AddedInterface 92m multus Add eth0 [10.130.0.7/23] Normal Pulled 92m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:feabd53e4be03a277277f08412bd8ea0a3caf0c63c3276fd301d2409647b4fb7" already present on machine Normal Created 92m kubelet Created container kube-rbac-proxy Normal Started 92m kubelet Started container kube-rbac-proxy Normal Pulling 92m kubelet Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:959e83ba3b9024f9cc06f13f10ef70fee5cebd9c773469878e9820c72c7a2efe" Normal Pulled 91m kubelet Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:959e83ba3b9024f9cc06f13f10ef70fee5cebd9c773469878e9820c72c7a2efe" in 14.520331526s Normal Created 91m kubelet Created container machine-api-operator Normal Started 91m kubelet Started container machine-api-operator
Thanks pdsilva to verify this, move to verified.
*** Bug 1942161 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438