Bug 1979297 - SystemExceedsMemoryReservation prometheusRule manages wrongly hugepage reservation
Summary: SystemExceedsMemoryReservation prometheusRule manages wrongly hugepage reserv...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.9.0
Assignee: Harshal Patil
QA Contact: Weinan Liu
URL:
Whiteboard:
Depends On:
Blocks: 2028854 2056502
TreeView+ depends on / blocked
 
Reported: 2021-07-05 14:08 UTC by Alberto Losada
Modified: 2022-02-21 16:44 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:38:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2661 0 None open Bug 1979297: Revert "Subtract hugepages from memory capacity and allocatables" 2021-07-06 06:41:18 UTC
Red Hat Bugzilla 1953846 1 unspecified CLOSED SystemMemoryExceedsReservation alert should consider hugepage reservation 2021-07-28 03:09:29 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:38:16 UTC

Description Alberto Losada 2021-07-05 14:08:45 UTC
Description of problem:

Prometheus rule for the well-known SystemExceedsMemoryReservation is not correct. Actually, I am getting a negative value when hugepages are configured. 

See here the query, which has been changed in 4.8: https://github.com/openshift/machine-config-operator/blob/release-4.8/install/0000_90_machine-config-operator_01_prometheus-rules.yaml#L54

sum by (node) (container_memory_rss{id="/system.slice"}) > ((sum by (node) (kube_node_status_capacity{resource="memory"}) - sum by (node) (kube_node_status_capacity{resource="hugepages_1Gi"}) - sum by (node) (kube_node_status_capacity{resource="hugepages_2Mi"}) - sum by (node) (kube_node_status_allocatable{resource="memory"}) - sum by (node)  (kube_node_status_allocatable{resource="hugepages_1Gi"}) - sum by (node)  (kube_node_status_allocatable{resource="hugepages_2Mi"})) * 0.9)


Taking into account my node resources:

Capacity:
  cpu:                                     80
  ephemeral-storage:                       3750202692Ki
  hugepages-1Gi:                           16Gi
  hugepages-2Mi:                           0
  management.workload.openshift.io/cores:  80k
  memory:                                  197707916Ki
Allocatable:
  cpu:                                     76
  ephemeral-storage:                       3456186795225
  hugepages-1Gi:                           16Gi
  hugepages-2Mi:                           0
  management.workload.openshift.io/cores:  80k
  memory:                                  179804300Ki


Taking a look at the query and the values from my node, you will notice that the right side of the query is removing twice the hugepage-1Gi value (16Gi) from allocatable and capacity.

Executing the query in Thanos querier (or Prometheus if you prefer) you get these value for the mentioned node:

* sum by (node) (container_memory_rss{id="/system.slice"}) 685.961.216
* sum by (node) (kube_node_status_capacity{resource="memory"}) 202.452.905.984
* sum by (node) (kube_node_status_allocatable{resource="memory"}) 184.119.603.200
* sum by (node) (kube_node_status_capacity{resource="hugepages_1Gi"}) 17.179.869.184
* sum by (node)  (kube_node_status_allocatable{resource="hugepages_1Gi"}) 17.179.869.184

Then, the result for the right side of the query is showing a negative result:

(sum by (node) (kube_node_status_capacity{resource="memory"}) - sum by (node) (kube_node_status_capacity{resource="hugepages_1Gi"}) - sum by (node) (kube_node_status_capacity{resource="hugepages_2Mi"}) - sum by (node) (kube_node_status_allocatable{resource="memory"}) - sum by (node)  (kube_node_status_allocatable{resource="hugepages_1Gi"}) - sum by (node)  (kube_node_status_allocatable{resource="hugepages_2Mi"})) = -16026435584

Which, eventually triggers always the alert.

Version-Release number of selected component (if applicable):


How reproducible:

Always if you configure hugepages

Steps to Reproduce:
1. Configure hugepages on your OpenShift cluster. In my case it is a SNO cluster that has been configured using the performance addon operator to reserve 16Gi of 1Gi hugepages
2. Execute the Prometheus query in Thanos querier or Prometheus console
3.

Actual results:

The alert is always triggered since the right side of the equation has a negative value when hugepages are configured

Expected results:


Additional info:

My tests were done on a SNO, however, the results must be the same on a regular OCP cluster.

Comment 8 errata-xmlrpc 2021-10-18 17:38:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.