Description of problem: The machine-api-operator pod is in CrashLoopBackOff state after installation on Power. There is a OOMKilled error in the pod description. # oc get nodes NAME STATUS ROLES AGE VERSION master-0 Ready master 4h18m v1.20.0+e1bc274 master-1 Ready master 4h18m v1.20.0+e1bc274 master-2 Ready master 4h18m v1.20.0+e1bc274 worker-0 Ready worker 4h4m v1.20.0+e1bc274 worker-1 Ready worker 4h4m v1.20.0+e1bc274 # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 42m baremetal 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 65m cloud-credential 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 73m cluster-autoscaler 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 65m config-operator 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 66m console 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 49m csi-snapshot-controller 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 66m dns 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 65m etcd 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 65m image-registry 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 41m ingress 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 53m insights 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 60m kube-apiserver 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 64m kube-controller-manager 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 64m kube-scheduler 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 63m kube-storage-version-migrator 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 52m machine-api 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 65m machine-approver 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 66m machine-config 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 64m marketplace 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 65m monitoring 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 52m network 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 66m node-tuning 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 66m openshift-apiserver 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 60m openshift-controller-manager 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 40m openshift-samples 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 60m operator-lifecycle-manager 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 65m operator-lifecycle-manager-catalog 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 65m operator-lifecycle-manager-packageserver 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 61m service-ca 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 66m storage 4.8.0-0.nightly-ppc64le-2021-03-14-051438 True False False 66m # oc get pods -A | grep openshift-machine-api openshift-machine-api cluster-autoscaler-operator-689586d58c-jbp6l 2/2 Running 1 4h54m openshift-machine-api cluster-baremetal-operator-8b948876-wcprh 2/2 Running 0 4h54m openshift-machine-api machine-api-operator-664cfb7d45-fmbjp 1/2 CrashLoopBackOff 21 27m # oc describe pod machine-api-operator-664cfb7d45-4v299 -n openshift-machine-api Name: machine-api-operator-664cfb7d45-4v299 Namespace: openshift-machine-api Priority: 2000001000 Priority Class Name: system-node-critical Node: master-2/192.168.26.126 Start Time: Sun, 14 Mar 2021 09:05:12 -0400 Labels: k8s-app=machine-api-operator pod-template-hash=664cfb7d45 Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "", "interface": "eth0", "ips": [ "10.129.0.68" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "interface": "eth0", "ips": [ "10.129.0.68" ], "default": true, "dns": {} }] openshift.io/scc: restricted Status: Running IP: 10.129.0.68 IPs: IP: 10.129.0.68 Controlled By: ReplicaSet/machine-api-operator-664cfb7d45 Containers: kube-rbac-proxy: Container ID: cri-o://600cd3f48e6378c622f7e0b5aba926b866754b8e4967369468e51bc2fba2f4ad Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e13f54418ac0779b58b73b3dc392609ac7731d47a1ca7cf493446eaef10024ed Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e13f54418ac0779b58b73b3dc392609ac7731d47a1ca7cf493446eaef10024ed Port: 8443/TCP Host Port: 0/TCP Args: --secure-listen-address=0.0.0.0:8443 --upstream=http://localhost:8080/ --tls-cert-file=/etc/tls/private/tls.crt --tls-private-key-file=/etc/tls/private/tls.key --config-file=/etc/kube-rbac-proxy/config-file.yaml --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 --logtostderr=true --v=3 State: Running Started: Sun, 14 Mar 2021 10:10:16 -0400 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Sun, 14 Mar 2021 10:06:21 -0400 Finished: Sun, 14 Mar 2021 10:07:28 -0400 Ready: True Restart Count: 21 Limits: cpu: 100m memory: 50Mi Requests: cpu: 10m memory: 20Mi Environment: <none> Mounts: /etc/kube-rbac-proxy from config (rw) /etc/tls/private from machine-api-operator-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from machine-api-operator-token-qgrz6 (ro) machine-api-operator: Container ID: cri-o://fa3ab32a2dd8f40f4c54575b985a68730daa469fa8697f113b1adb255df95cb2 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0ed79a307757581cbaab976b95dde902c4724a7eb4ef7fee7991cf1b63205fe0 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0ed79a307757581cbaab976b95dde902c4724a7eb4ef7fee7991cf1b63205fe0 Port: <none> Host Port: <none> Command: /machine-api-operator Args: start --images-json=/etc/machine-api-operator-config/images/images.json --alsologtostderr --v=3 State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 2 Started: Sun, 14 Mar 2021 10:07:29 -0400 Finished: Sun, 14 Mar 2021 10:07:30 -0400 Ready: False Restart Count: 20 Limits: cpu: 100m memory: 50Mi Requests: cpu: 10m memory: 50Mi Environment: RELEASE_VERSION: 4.8.0-0.nightly-ppc64le-2021-03-14-051438 COMPONENT_NAMESPACE: openshift-machine-api (v1:metadata.namespace) METRICS_PORT: 8080 Mounts: /etc/machine-api-operator-config/images from images (rw) /var/run/secrets/kubernetes.io/serviceaccount from machine-api-operator-token-qgrz6 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config: Type: ConfigMap (a volume populated by a ConfigMap) Name: kube-rbac-proxy Optional: false images: Type: ConfigMap (a volume populated by a ConfigMap) Name: machine-api-operator-images Optional: false machine-api-operator-tls: Type: Secret (a volume populated by a Secret) SecretName: machine-api-operator-tls Optional: false machine-api-operator-token-qgrz6: Type: Secret (a volume populated by a Secret) SecretName: machine-api-operator-token-qgrz6 Optional: false QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 120s node.kubernetes.io/unreachable:NoExecute op=Exists for 120s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 73m default-scheduler no nodes available to schedule pods Warning FailedScheduling 67m default-scheduler 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. Warning FailedScheduling 67m default-scheduler 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. Normal Scheduled 66m default-scheduler Successfully assigned openshift-machine-api/machine-api-operator-664cfb7d45-4v299 to master-2 Warning FailedScheduling 73m default-scheduler no nodes available to schedule pods Warning FailedMount 66m kubelet MountVolume.SetUp failed for volume "machine-api-operator-tls" : failed to sync secret cache: timed out waiting for the condition Warning FailedMount 66m (x6 over 66m) kubelet MountVolume.SetUp failed for volume "machine-api-operator-tls" : secret "machine-api-operator-tls" not found Normal AddedInterface 65m multus Add eth0 [10.129.0.10/23] Normal Pulling 65m kubelet Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0ed79a307757581cbaab976b95dde902c4724a7eb4ef7fee7991cf1b63205fe0" Normal Pulled 65m kubelet Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0ed79a307757581cbaab976b95dde902c4724a7eb4ef7fee7991cf1b63205fe0" in 13.894041473s Normal AddedInterface 65m multus Add eth0 [10.129.0.14/23] Normal Started 65m (x2 over 65m) kubelet Started container kube-rbac-proxy Normal Created 65m (x2 over 65m) kubelet Created container kube-rbac-proxy Normal Pulled 65m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0ed79a307757581cbaab976b95dde902c4724a7eb4ef7fee7991cf1b63205fe0" already present on machine Normal Created 64m (x2 over 65m) kubelet Created container machine-api-operator Normal Started 64m (x2 over 65m) kubelet Started container machine-api-operator Normal SandboxChanged 64m (x3 over 65m) kubelet Pod sandbox changed, it will be killed and re-created. Normal Killing 64m (x2 over 65m) kubelet Stopping container machine-api-operator Normal AddedInterface 64m multus Add eth0 [10.129.0.15/23] Normal AddedInterface 64m multus Add eth0 [10.129.0.20/23] Normal AddedInterface 64m multus Add eth0 [10.129.0.21/23] Normal AddedInterface 63m multus Add eth0 [10.129.0.25/23] Normal AddedInterface 61m multus Add eth0 [10.129.0.27/23] Normal AddedInterface 58m multus Add eth0 [10.129.0.33/23] Normal AddedInterface 54m multus Add eth0 [10.129.0.40/23] Normal AddedInterface 49m multus Add eth0 [10.129.0.47/23] Normal AddedInterface 43m multus Add eth0 [10.129.0.49/23] Normal AddedInterface 38m multus Add eth0 [10.129.0.53/23] Normal AddedInterface 38m multus Add eth0 [10.129.0.54/23] Normal AddedInterface 37m multus Add eth0 [10.129.0.57/23] Normal AddedInterface 36m multus Add eth0 [10.129.0.58/23] Normal AddedInterface 33m multus Add eth0 [10.129.0.59/23] Normal AddedInterface 30m multus Add eth0 [10.129.0.60/23] Normal AddedInterface 25m multus Add eth0 [10.129.0.61/23] Normal AddedInterface 20m multus Add eth0 [10.129.0.62/23] Warning BackOff 16m (x162 over 64m) kubelet Back-off restarting failed container Normal AddedInterface 14m multus Add eth0 [10.129.0.63/23] Normal AddedInterface 9m44s multus Add eth0 [10.129.0.64/23] Normal AddedInterface 9m23s multus Add eth0 [10.129.0.65/23] Normal AddedInterface 8m37s multus Add eth0 [10.129.0.66/23] Normal AddedInterface 7m5s multus Add eth0 [10.129.0.67/23] Warning BackOff 6m37s (x207 over 64m) kubelet Back-off restarting failed container Normal AddedInterface 4m22s multus Add eth0 [10.129.0.68/23] Normal Pulled 99s (x25 over 65m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e13f54418ac0779b58b73b3dc392609ac7731d47a1ca7cf493446eaef10024ed" already present on machine # oc logs machine-api-operator-664cfb7d45-fmbjp -n openshift-machine-api machine-api-operator I0314 17:27:51.040810 1 start.go:62] Version: 4.8.0-202103140432.p0-dirty I0314 17:27:51.137782 1 leaderelection.go:243] attempting to acquire leader lease openshift-machine-api/machine-api-operator... How reproducible: Always Steps to Reproduce: 1. Install the nightly build of 4.8 on Power Actual results: machine-api-operator pod is in CrashLoopBackOff state. Additional info: # oc version Client Version: 4.8.0-0.nightly-ppc64le-2021-03-14-051438 Server Version: 4.8.0-0.nightly-ppc64le-2021-03-14-051438 Kubernetes Version: v1.20.0+e1bc274
We just added resource limits to pods on this build and have been told this is not right for OpenShift workloads. We are reverting this out and that will resolve this issue. *** This bug has been marked as a duplicate of bug 1938493 ***