Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1751959

Summary: When installing the GA of Service Mesh 1.0 on a vanilla IPI AWS 4.2 (nightly) the CPU settings for pilot prohibit the Pods starting
Product: OpenShift Container Platform Reporter: Ian Lawson <ian.lawson>
Component: NodeAssignee: Ryan Phillips <rphillips>
Status: CLOSED NOTABUG QA Contact: MinLi <minmli>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: aos-bugs, eparis, jokerman, nagrawal, rphillips, schoudha
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1752912 (view as bug list) Environment:
Last Closed: 2019-09-19 12:50:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1752912    

Description Ian Lawson 2019-09-13 09:15:49 UTC
Description of problem:

Clean install of 4.2 nightly build using IPI on AWS with all defaults, installation of ServiceMesh 1.0 (GA) using basic vanilla approach with no specific configuration, the Pilot Pods fail to start because the required CPU limit (500) is larger that the standard AWS machine size used for IPI.


Version-Release number of selected component (if applicable):

4.2.0-0.nightly-2019-09-04-142146
ServiceMesh 1.0 GA

How reproducible:

Always - post installation of ServiceMesh the Pilot Pods stick in Pending and 'Unschedulable'

Steps to Reproduce:
1. Install the 4.2 nightly build via the openshift_installer
2. Follow the instructions for installing the GA ServiceMesh
3. Watch the Pod workload and see the Pilot Pods failing to start

Actual results:

Pilot Pods fail to start as unschedulable

Expected results:

All Pods start correctly

Additional info:

By manually changing the Pod spec YAML for the Pilot Pods and reducing the required CPU to 100 from 500 the Pods start correctly

Comment 2 Ryan Phillips 2019-09-17 13:30:02 UTC
Could be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1720174

Comment 3 Ryan Phillips 2019-09-17 13:30:40 UTC
PR pending for 3.11 https://github.com/openshift/origin/pull/23779 which will be ported to the other releases.

Comment 5 MinLi 2019-09-19 09:03:15 UTC
@Ian Lawson , 
Can you provide the ServiceMeshControlPlane template for this bug?
And "the standard AWS machine size used for IPI" mean which type? for example m5.large or else?

Comment 6 Ian Lawson 2019-09-19 09:21:01 UTC
Installation was via the instructions on the Customer Portal at https://docs.openshift.com/container-platform/4.1/service_mesh/service_mesh_install/installing-ossm.html#installing-ossm

The configuration for the Cluster was driven by doing a vanilla run of the openshift_installer for AWS with no configuration changes, which generates a cluster with 3xMaster (m4.xlarge) and 3xWorker (m4.large)

Everything was done with the defaults provided as part of the ServiceMesh install and IPI installer for AWS - I had to manually change the Pod spec for the Pilot component to get it to deploy as the 500 requirement is too great for default settings of the IPI workers.

Comment 7 Sunil Choudhary 2019-09-19 12:43:15 UTC
Hi Ian,

Checking this bz and going through the Service Mesh installation docs, it appears this is not actually a bug.

The default cluster which you have installed has created m4.large VM instance which have 2 CPUs (2000m cpu capacity) out of which around 1500m is available for pod allocation.

Going through installation docs, I see Pilot pod have default cpu value of 500m, however in docs it is mentioned to configure it as per the resources available on OpenShift cluster. Also the appropriate value of 100m is mentioned for Pilot pod.

Since Service Mesh have many components which I guess is using the limited CPU resource available on 3 worker nodes after which Pilot pod's 500m cpu request is not getting fulfilled by nodes. So I guess you would need to either set lower cpu value for Pilot pod or provision larger worker nodes instances. I am adding reference to the specific docs section below.


From ServiceMesh docs:
The resources you configure for Red Hat OpenShift Service Mesh with these parameters, including CPUs, memory, and the number of pods, are based on the configuration of your OpenShift cluster. Configure these parameters based on the available resources in your current cluster configuration.

ServiceMeshControlPlane parameters
https://docs.openshift.com/container-platform/4.1/service_mesh/service_mesh_install/customizing-installation-ossm.html#ossm-cr-parameters_customizing-installation-ossm

Istio Pilot configuration
https://docs.openshift.com/container-platform/4.1/service_mesh/service_mesh_install/customizing-installation-ossm.html#ossm-cr-parameters_customizing-installation-ossm

Thanks

Comment 8 Ian Lawson 2019-09-19 12:46:51 UTC
Cool - that's what I did eventually, lowered the Pod spec requirement to 100...

Comment 9 Sunil Choudhary 2019-09-19 12:50:25 UTC
Thanks Ian, I will close this bz then.