Description of problem: Clean install of 4.2 nightly build using IPI on AWS with all defaults, installation of ServiceMesh 1.0 (GA) using basic vanilla approach with no specific configuration, the Pilot Pods fail to start because the required CPU limit (500) is larger that the standard AWS machine size used for IPI. Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-09-04-142146 ServiceMesh 1.0 GA How reproducible: Always - post installation of ServiceMesh the Pilot Pods stick in Pending and 'Unschedulable' Steps to Reproduce: 1. Install the 4.2 nightly build via the openshift_installer 2. Follow the instructions for installing the GA ServiceMesh 3. Watch the Pod workload and see the Pilot Pods failing to start Actual results: Pilot Pods fail to start as unschedulable Expected results: All Pods start correctly Additional info: By manually changing the Pod spec YAML for the Pilot Pods and reducing the required CPU to 100 from 500 the Pods start correctly
Could be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1720174
PR pending for 3.11 https://github.com/openshift/origin/pull/23779 which will be ported to the other releases.
@Ian Lawson , Can you provide the ServiceMeshControlPlane template for this bug? And "the standard AWS machine size used for IPI" mean which type? for example m5.large or else?
Installation was via the instructions on the Customer Portal at https://docs.openshift.com/container-platform/4.1/service_mesh/service_mesh_install/installing-ossm.html#installing-ossm The configuration for the Cluster was driven by doing a vanilla run of the openshift_installer for AWS with no configuration changes, which generates a cluster with 3xMaster (m4.xlarge) and 3xWorker (m4.large) Everything was done with the defaults provided as part of the ServiceMesh install and IPI installer for AWS - I had to manually change the Pod spec for the Pilot component to get it to deploy as the 500 requirement is too great for default settings of the IPI workers.
Hi Ian, Checking this bz and going through the Service Mesh installation docs, it appears this is not actually a bug. The default cluster which you have installed has created m4.large VM instance which have 2 CPUs (2000m cpu capacity) out of which around 1500m is available for pod allocation. Going through installation docs, I see Pilot pod have default cpu value of 500m, however in docs it is mentioned to configure it as per the resources available on OpenShift cluster. Also the appropriate value of 100m is mentioned for Pilot pod. Since Service Mesh have many components which I guess is using the limited CPU resource available on 3 worker nodes after which Pilot pod's 500m cpu request is not getting fulfilled by nodes. So I guess you would need to either set lower cpu value for Pilot pod or provision larger worker nodes instances. I am adding reference to the specific docs section below. From ServiceMesh docs: The resources you configure for Red Hat OpenShift Service Mesh with these parameters, including CPUs, memory, and the number of pods, are based on the configuration of your OpenShift cluster. Configure these parameters based on the available resources in your current cluster configuration. ServiceMeshControlPlane parameters https://docs.openshift.com/container-platform/4.1/service_mesh/service_mesh_install/customizing-installation-ossm.html#ossm-cr-parameters_customizing-installation-ossm Istio Pilot configuration https://docs.openshift.com/container-platform/4.1/service_mesh/service_mesh_install/customizing-installation-ossm.html#ossm-cr-parameters_customizing-installation-ossm Thanks
Cool - that's what I did eventually, lowered the Pod spec requirement to 100...
Thanks Ian, I will close this bz then.