Bug 1953846 - SystemMemoryExceedsReservation alert should consider hugepage reservation
Summary: SystemMemoryExceedsReservation alert should consider hugepage reservation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.8.0
Assignee: Harshal Patil
QA Contact: Weinan Liu
URL:
Whiteboard:
: 1955044 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-27 05:35 UTC by Xingbin Li
Modified: 2021-07-28 03:09 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:04:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2555 0 None open Bug 1953846: Subtract hugepages from memory capacity and allocatables 2021-04-30 10:38:15 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:04:22 UTC

Internal Links: 1979297

Description Xingbin Li 2021-04-27 05:35:39 UTC
SystemMemoryExceedsReservation alert which is added from OCP 4.6 should consider Hugepage reservation.

The SystemMemoryExceedsReservation alert uses following Prometheus query:

~~~
sum by (node) (container_memory_rss{id=\"/system.slice\"}) > ((sum by (node) (kube_node_status_capacity{resource=\"\memory\"} - kube_node_status_allocatable{resource=\"memory\"})) * 0.9)
~~~

As per the above query, If hugepages were set on worker node, the right side of the check would contain hugepages that are supposed to be allocated by the applications. The left side indicates working memory allocated by system processes related to containers running inside the node.
In this case, the right side would be added much more application memory size that is irrelevant to the system reserved memory, so the alert would become meaningless.




For example, if a node has 30GiB of hugepages like below:

~~~
$ oc describe node <node-name>

...
Capacity:
cpu:                      80
ephemeral-storage:        2096613Mi
hugepages-1Gi:            30Gi
hugepages-2Mi:            0
memory:                   527977304Ki
openshift.io/dpdk_ext0:   0
openshift.io/f1u:         10
openshift.io/sriov_ext0:  10
pods:                     250

Allocatable:
cpu:                      79500m
ephemeral-storage:        1977538520680
hugepages-1Gi:            30Gi
hugepages-2Mi:            0
memory:                   495369048Ki
openshift.io/dpdk_ext0:   0
openshift.io/f1u:         10
openshift.io/sriov_ext0:  10
pods:                     250
..
~~~

The system-reserved contains the 30GiB of huge pages which will be allocated by the applications. 

SystemReserved  =    (kube_node_status_capacity{resource="memory"} - kube_node_status_allocatable{resource="memory"}))   
 = 527977304Ki - 495369048Ki = 31GiB

And (container_memory_rss {id = \"/system.slice \"}) is unlikely to be larger than the right side, as the underlying system process rarely uses huge pages as far as I know.

I am not sure If my understanding is correct or not , if I am wrong please let me know.

Comment 1 Simon Pasquier 2021-04-27 06:40:36 UTC
This alert is managed by the machine-config-operator [1], reassigning to the team.

[1] https://github.com/openshift/machine-config-operator/blob/f86955971533aacbb4bb66f5c7041057d3f33566/install/0000_90_machine-config-operator_01_prometheus-rules.yaml#L53-L60

Comment 2 Yu Qi Zhang 2021-04-27 21:49:33 UTC
Passing over to the node team to take a look as well, since its a kubelet warning

Comment 3 Sanket N 2021-04-29 12:46:49 UTC
*** Bug 1955044 has been marked as a duplicate of this bug. ***

Comment 7 Weinan Liu 2021-06-11 09:40:43 UTC
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         2
  ephemeral-storage:           125293548Ki
  hugepages-1Gi:               5Gi
  hugepages-2Mi:               0
  memory:                      7935292Ki
  pods:                        250
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1500m
  ephemeral-storage:           115470533646
  hugepages-1Gi:               5Gi
  hugepages-2Mi:               0
  memory:                      1541436Ki
  pods:                        250

7935292Ki-1541436Ki-5Gi=1.097Gi

Verified to get fixed on
oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-06-10-071057   True        False         6h28m   Cluster version is 4.8.0-0.nightly-2021-06-10-071057

Comment 10 errata-xmlrpc 2021-07-27 23:04:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 11 Xingbin Li 2021-07-28 03:09:33 UTC
Do we have any plans to backport this to OCP 4.7 ?


Note You need to log in before you can comment on or make changes to this bug.