2263148 – ODF installation fails if "Performance Mode" is picked up during install ( "Performance Mode" requests on CPU exceed available cpu on OCP node )

Bug 2263148 - ODF installation fails if "Performance Mode" is picked up during install ( "Performance Mode" requests on CPU exceed available cpu on OCP node )

Summary: ODF installation fails if "Performance Mode" is picked up during install ( "P...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	management-console
Sub Component:
Version:	4.15
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	ODF 4.16.0
Assignee:	Alfonso Martínez
QA Contact:	Joy John Pinto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-02-07 08:18 UTC by Elvir Kuric
Modified:	2024-07-17 13:13 UTC (History)
CC List:	7 users (show)
Fixed In Version:	4.16.0-94
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-07-17 13:13:47 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage odf-console pull 1326	None	open	Performance profiles: calculate min. resources taking into account OSDs	2024-04-25 06:43:44 UTC
Github	red-hat-storage odf-console pull 1351	None	open	Bug 2263148: [release-4.16] Performance profiles: calculate min. resources taking into account OSDs	2024-05-02 14:46:56 UTC
Github	red-hat-storage odf-console pull 1352	None	open	Bug 2263148: [release-4.16-compatibility] Performance profiles: calculate min. resources taking into account OSDs	2024-05-02 14:44:35 UTC
Red Hat Product Errata	RHSA-2024:4591	None	None	None	2024-07-17 13:13:51 UTC

Description Elvir Kuric 2024-02-07 08:18:24 UTC

Description of problem:
When installing ODF v4.15 now we option to specify "Lean mode" , or "Balanced Mode" , or "Performance Mode" which will depending on what user specify allocate different values for CPU/Memory ( particularity OSDs ) when ODF is installed. 

This approach is OK, but it has below noticeable issue 

- At this stage we do not know how many CPU is present on OCP node and if user specify "Performance Mode" what allocates 4 CPU / OSD and if system has many OSDs where numOSD x 4 > TotalCPU_Available - then many OSDs will fail to start / ODF installation will fail. 

Version-Release number of selected component (if applicable):
OCP v4.15 

 oc get csv
NAME                                         DISPLAY                       VERSION             REPLACES   PHASE
mcg-operator.v4.15.0-130.stable              NooBaa Operator               4.15.0-130.stable              Succeeded
ocs-operator.v4.15.0-130.stable              OpenShift Container Storage   4.15.0-130.stable              Succeeded
odf-csi-addons-operator.v4.15.0-130.stable   CSI Addons                    4.15.0-130.stable              Succeeded
odf-operator.v4.15.0-130.stable              OpenShift Data Foundation     4.15.0-130.stable              Succeeded


How reproducible:

always 

Steps to Reproduce:
easy to reproduce if "Performance Profile" cpu requests exceed available CPU resources on node - 

1. Try to install ODF v4.15 
2. select Performance profile 
3. ODF installation will fail 

Actual results:

ODF installation fails 


Expected results:
ODF Installation to work 


Additional info:
NA

Comment 2 Malay Kumar parida 2024-02-15 13:30:42 UTC

We had a lot of discussion about this situation during the design phase of the feature itself. We did think of adding some validation on the UI before choosing a profile. The validation was intended to calculate & see if a selected profile can be applied successfully on a cluster by its available resources. But after a long discussion & digging, we found it's impossible to predict whether a pod can schedule before actually doing it. K8s scheduling is a hard computational problem that we can't solve.

We decided we would do 2 things.
1. Basic validation on UI by just checking the basic total resources available on the cluster and if that meets a certain basic number we have decided.
2. Clear messaging in the doc mentioning that if you don't have sufficient resources choosing a higher resource profile will result in failure, so please choose accordingly.

Comment 4 Malay Kumar parida 2024-04-15 04:45:24 UTC

This warning if implemented would be on the console/UI. Moving there and assigning to Alfonso as he has the context.

Comment 9 Joy John Pinto 2024-05-31 09:45:45 UTC

Verified with OCP 4.16.0-0.nightly-2024-05-23-173505 and ODF 4.16.0-113.stable provided by Red Hat

Steps performed upon verification:
1. Installed ODF on ibm cloud machine with instance type bx2-16x64, Thus having total resources of 48 CPUs and 188.7 GiB on 3 zone
2. Increased number of OSD by add capacity option from existing 3 to 12
3. Upon adding 12 osds, the total requested resources by performance profile was greater than available resources hence proper error message was displayed and mode selection option was greyed out, Refer 12osd_resource_req_not met.png
4. Also the CPUs and Memory required values for each performance profile kept on incraesing as #of osds were increased post 6 osds, refer 6_osd.png and 9_osd.png
5. Tried applying 'balanced Mode" performance profile, which was successful.

Comment 14 errata-xmlrpc 2024-07-17 13:13:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591

Note You need to log in before you can comment on or make changes to this bug.