2259616 – UI option while changing resource profile the node memory details are updated with new additions

Bug 2259616 - UI option while changing resource profile the node memory details are updated with new additions

Summary: UI option while changing resource profile the node memory details are updated...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	management-console
Sub Component:
Version:	4.15
Hardware:	ppc64le
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	ODF 4.16.0
Assignee:	Alfonso Martínez
QA Contact:	narayanspg
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-01-22 12:35 UTC by narayanspg
Modified:	2024-07-17 13:12 UTC (History)
CC List:	4 users (show)
Fixed In Version:	4.16.0-81
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-07-17 13:12:23 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage odf-console pull 1280	None	Merged	Storage System: Get nodes' memory from metrics	2024-04-17 10:44:32 UTC
Github	red-hat-storage odf-console pull 1299	None	open	Bug 2259616: [release-4.16] Storage System: Get nodes' memory from metrics	2024-04-17 12:01:55 UTC
Github	red-hat-storage odf-console pull 1300	None	open	Bug 2259616: [release-4.16-compatibility] Storage System: Get nodes' memory from metrics	2024-04-17 10:42:16 UTC
Red Hat Product Errata	RHSA-2024:4591	None	None	None	2024-07-17 13:12:26 UTC

Description narayanspg 2024-01-22 12:35:17 UTC

Created attachment 2009631 [details]
resource constraint to perform resource profile change

Description of problem (please be detailed as possible and provide log
snippests):
Deploy ODF and storagecluster is set to ready state. use UI option to change the resource profile to performance mode. get error - "Aggregate resource requirements for the selected performance profile not met"  (attached Screenshot1.png)
added additional memory requirements (attached Screenshot2.png)
node details shows the newly added changes are reflecting in the worker node detials. (Screenshot2.png)
Now try to change the resource profile to performance, but still we get popup saying resource requirements are not enough. the new memory additions are not reflecting (Screenshot3.png)


Version of all relevant components (if applicable):
[root@nara7-aacc-bastion-0 ~]# oc get csv -A
NAMESPACE                              NAME                                          DISPLAY                       VERSION               REPLACES                                PHASE
openshift-local-storage                local-storage-operator.v4.14.0-202311031050   Local Storage                 4.14.0-202311031050                                           Succeeded
openshift-operator-lifecycle-manager   packageserver                                 Package Server                0.0.1-snapshot                                                Succeeded
openshift-storage                      mcg-operator.v4.15.0-120.stable               NooBaa Operator               4.15.0-120.stable     mcg-operator.v4.14.3-rhodf              Succeeded
openshift-storage                      ocs-operator.v4.15.0-120.stable               OpenShift Container Storage   4.15.0-120.stable     ocs-operator.v4.14.3-rhodf              Succeeded
openshift-storage                      odf-csi-addons-operator.v4.15.0-120.stable    CSI Addons                    4.15.0-120.stable     odf-csi-addons-operator.v4.14.3-rhodf   Succeeded
openshift-storage                      odf-operator.v4.15.0-120.stable               OpenShift Data Foundation     4.15.0-120.stable     odf-operator.v4.14.3-rhodf              Succeeded
[root@nara7-aacc-bastion-0 ~]#
[root@nara7-aacc-bastion-0 ~]# oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-rc.1   True        False         3d6h    Cluster version is 4.15.0-rc.1
[root@nara7-aacc-bastion-0 ~]#


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?
NO

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy ODF and try to perform resource profile change with less than 96GB of aggregate.
2. increase the worker node memory and we can see the worker nodes memory is reflecting in worker node details.
3. perform again to change the resource profile to performance mode, we still see the memory changes are not reflecting and showing old values.


Actual results:
memory changes for worker nodes are not getting reflecting during resource profile changes

Expected results:
memory changes for worker nodes should reflect during resource profile changes and able to change resource profile to performance mode.

Additional info:

Comment 5 Sanjal Katiyar 2024-01-22 14:58:42 UTC

thanks for sharing the YAMLs... can you please also run following under "Observe > Metrics" and share the output:
1. sum by (instance) (node_memory_MemTotal_bytes)
2. sum by (instance) (node_memory_MemAvailable_bytes)
3. sum by (instance) (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)

Comment 6 Sanjal Katiyar 2024-01-22 15:19:46 UTC

also, what's the cluster infra (BareMetal, vSphere etc) ??

Comment 8 Sanjal Katiyar 2024-01-22 17:27:13 UTC

there are 3 ways to determine a node's capacity:

1. "status.allocatable.memory" field in the Node's CR >> Which represents the resources of a node that are available for scheduling.
2. "status.capacity.memory" field in the Node's CR >> Which represents the total resources of a node.
3. "node_memory_MemTotal_bytes" metric from the Prometheus.


Typically option "2" and "3" should report similar values, and option "1" should be slightly lower than other two. OCP "Compute > Nodes" list page uses option "3", whereas in ODF (Performance profile modal and even during StorageSystem deployment) we use option "1".
Checking the YAML shared above, Node's CR is reporting around 30.7GiB of allocatable capacity and around 31.8GiB of total capacity, whereas "node_" metric is reporting around 36.8GiB. Hence, the mismatch between what's seen/calculated on OCP and ODF.

May I know the exact steps used for increasing resources on the nodes ?? Also, https://bugzilla.redhat.com/show_bug.cgi?id=2259616#c6 got missed, can you plz answer this as well ??

Comment 10 Sanjal Katiyar 2024-01-23 09:20:34 UTC

(In reply to Sanjal Katiyar from comment #8)

> Typically option "2" and "3" should report similar values, and option "1"
> should be slightly lower than other two. OCP "Compute > Nodes" list page
> uses option "3", whereas in ODF (Performance profile modal and even during
> StorageSystem deployment) we use option "1".
> Checking the YAML shared above, Node's CR is reporting around 30.7GiB of
> allocatable capacity and around 31.8GiB of total capacity, whereas "node_"
> metric is reporting around 36.8GiB. Hence, the mismatch between what's
> seen/calculated on OCP and ODF.

Checked AWS/BareMetal clusters for testing, and both were reporting correct node capacities (CR & metric were reporting almost similar values). But, for some reason in the above "PowerVM" cluster, CR is reporting different value than the "node_" metric.
Moving it to 4.16 for now, please raise it as a blocker if we are sure that this is a bug and something which needs to be fixed in 4.15.0 version itself.

Comment 11 Sanjal Katiyar 2024-04-02 08:15:43 UTC

as a fix/enhancement we will rely on metric for this nodes table as well, instead of CR (just like OCP list page and our Topology page)...

Comment 17 Prasad Desala 2024-05-08 05:21:01 UTC

Moving the BZ to verified state based on comment 16.

Comment 19 errata-xmlrpc 2024-07-17 13:12:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591

Note You need to log in before you can comment on or make changes to this bug.