+++ This bug was initially created as a clone of Bug #1809593 +++ Potentially a revision of BZ#1809593 Description of problem: Many pods oom-killed, pending, or error state on deployment Version-Release number of selected component (if applicable): OCP 4.3 OCP 4.4.9 cri-o-1.17.4-17.dev.rhaos4.4.gitf0cfdfc.el8.ppc64le How reproducible: Every deployment attempt Steps to Reproduce: 1. Deploy following https://docs.openshift.com/container-platform/4.4/installing/installing_ibm_power/installing-ibm-power.html#machine-requirements_installing-ibm-power 2. run: oc get pods -A | grep -v Running | grep -v Completed Actual results: API installer pods are oom-killed; many other pods stuck in pending or error state Expected results: Successful deployment Additional info: oc get po -A | grep -v Running | grep -v Completed =================================================== NAMESPACE NAME READY STATUS RESTARTS AGE openshift-apiserver apiserver-577c8bffdb-8vj8n 0/1 Pending 0 10m openshift-apiserver apiserver-577c8bffdb-b7b5n 0/1 Pending 0 10m openshift-apiserver apiserver-577c8bffdb-k29l8 0/1 Pending 0 10m openshift-cluster-node-tuning-operator tuned-8hfdx 0/1 Pending 0 2m17s openshift-cluster-node-tuning-operator tuned-qfs94 0/1 Pending 0 2m14s openshift-cluster-storage-operator csi-snapshot-controller-operator-7d76c9d7cf-dgzvs 0/1 Pending 0 20m openshift-etcd installer-2-master3 0/1 Error 0 9m24s openshift-kube-apiserver installer-2-master3 0/1 Error 0 9m53s openshift-kube-apiserver installer-3-master3 0/1 OOMKilled 0 9m17s openshift-kube-apiserver revision-pruner-2-master2 0/1 OOMKilled 0 9m22s openshift-kube-apiserver revision-pruner-2-master3 0/1 Error 0 9m18s openshift-kube-controller-manager installer-4-master3 0/1 Error 0 9m54s openshift-kube-controller-manager installer-5-master2 0/1 OOMKilled 0 7m11s openshift-kube-controller-manager revision-pruner-4-master1 0/1 OOMKilled 0 9m34s openshift-kube-controller-manager revision-pruner-4-master2 0/1 OOMKilled 0 9m32s openshift-kube-controller-manager revision-pruner-5-master2 0/1 Error 0 5m43s openshift-kube-controller-manager revision-pruner-5-master3 0/1 OOMKilled 0 8m48s openshift-kube-scheduler installer-5-master3 0/1 Error 0 11m openshift-kube-scheduler installer-6-master3 0/1 Error 0 9m31s openshift-kube-scheduler revision-pruner-4-master2 0/1 Error 0 11m openshift-kube-scheduler revision-pruner-5-master3 0/1 Error 0 10m openshift-kube-scheduler revision-pruner-6-master3 0/1 Error 0 9m21s openshift-kube-storage-version-migrator migrator-f8f8bd9d8-tqrmg 0/1 Pending 0 12m openshift-multus multus-brwdx 0/1 Pending 0 2m17s openshift-multus multus-kk95s 0/1 Pending 0 2m14s openshift-sdn ovs-4pdr9 0/1 Pending 0 2m14s openshift-sdn ovs-cncw7 0/1 Pending 0 2m17s openshift-sdn sdn-vzd9b 0/1 Pending 0 2m14s openshift-sdn sdn-zljhn 0/1 Pending 0 2m17s
I don't have access to any internal posts on this bugzilla, but I just wanted to mention that there is an external customer who is unable to deploy OCP on his IBM POWER9 system. He opened his original support case on 7 July. If there is any additional data to collect, or any workaround which might help, I'm sure he would appreciate an update sooner rather than later.
Hi Rob, We may need additional clarification here. Are you unable to install any 4.3/4.4 on ppc64le at all? What specific release version of OCP are you using? Which method are you using to deploy OCP?
Hey, we tried to install 4.3.18 and 4.4.9 (both latest versions at the time of installation). We used this specific version from the installer (for 4.4.9 and I ): openshift-install 4.4.9 built from commit 1541bf917973186bbab6a5f895f08db4334a5d9a release image quay.io/openshift-release-dev/ocp-release@sha256:ae48474c6fcd0666a672ce2a449736a2549693c04186ea588e37477635c976a6 We followed the standard installation guide (https://docs.openshift.com/container-platform/4.4/installing/installing_ibm_power/installing-ibm-power.html) with IPs via DHCP.
Duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1812126
This is resolved in 4.5. If you would like the installation to succeed on 4.3/4.4 the number of cores assigned to the master nodes should be 2. Which on SMT=8 leads to 16 threads. The master nodes can be resized after the initial install.
Hi team! This is a duplicate of an issue that was discovered in the 4.3 timeframe,and ultimately fixed in 4.5. Some additional context from Manoj Kumar: [This issue is related to] large page size and the kernel/cgroup caches being scaled to the number of virtual CPUs/threads. The workaround is to: Install with the default/minimum size of the master node (as in the documentation). 2 Cores, on SMT=8. This information should already be present in the documentation.
*** This bug has been marked as a duplicate of bug 1812126 ***