SystemMemoryExceedsReservation alert which is added from OCP 4.6 should consider Hugepage reservation. The SystemMemoryExceedsReservation alert uses following Prometheus query: ~~~ sum by (node) (container_memory_rss{id=\"/system.slice\"}) > ((sum by (node) (kube_node_status_capacity{resource=\"\memory\"} - kube_node_status_allocatable{resource=\"memory\"})) * 0.9) ~~~ As per the above query, If hugepages were set on worker node, the right side of the check would contain hugepages that are supposed to be allocated by the applications. The left side indicates working memory allocated by system processes related to containers running inside the node. In this case, the right side would be added much more application memory size that is irrelevant to the system reserved memory, so the alert would become meaningless. For example, if a node has 30GiB of hugepages like below: ~~~ $ oc describe node <node-name> ... Capacity: cpu: 80 ephemeral-storage: 2096613Mi hugepages-1Gi: 30Gi hugepages-2Mi: 0 memory: 527977304Ki openshift.io/dpdk_ext0: 0 openshift.io/f1u: 10 openshift.io/sriov_ext0: 10 pods: 250 Allocatable: cpu: 79500m ephemeral-storage: 1977538520680 hugepages-1Gi: 30Gi hugepages-2Mi: 0 memory: 495369048Ki openshift.io/dpdk_ext0: 0 openshift.io/f1u: 10 openshift.io/sriov_ext0: 10 pods: 250 .. ~~~ The system-reserved contains the 30GiB of huge pages which will be allocated by the applications. SystemReserved = (kube_node_status_capacity{resource="memory"} - kube_node_status_allocatable{resource="memory"})) = 527977304Ki - 495369048Ki = 31GiB And (container_memory_rss {id = \"/system.slice \"}) is unlikely to be larger than the right side, as the underlying system process rarely uses huge pages as far as I know. I am not sure If my understanding is correct or not , if I am wrong please let me know.
This alert is managed by the machine-config-operator [1], reassigning to the team. [1] https://github.com/openshift/machine-config-operator/blob/f86955971533aacbb4bb66f5c7041057d3f33566/install/0000_90_machine-config-operator_01_prometheus-rules.yaml#L53-L60
Passing over to the node team to take a look as well, since its a kubelet warning
*** Bug 1955044 has been marked as a duplicate of this bug. ***
Capacity: attachable-volumes-aws-ebs: 25 cpu: 2 ephemeral-storage: 125293548Ki hugepages-1Gi: 5Gi hugepages-2Mi: 0 memory: 7935292Ki pods: 250 Allocatable: attachable-volumes-aws-ebs: 25 cpu: 1500m ephemeral-storage: 115470533646 hugepages-1Gi: 5Gi hugepages-2Mi: 0 memory: 1541436Ki pods: 250 7935292Ki-1541436Ki-5Gi=1.097Gi Verified to get fixed on oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-06-10-071057 True False 6h28m Cluster version is 4.8.0-0.nightly-2021-06-10-071057
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438
Do we have any plans to backport this to OCP 4.7 ?