1979297 – SystemExceedsMemoryReservation prometheusRule manages wrongly hugepage reservation

Bug 1979297 - SystemExceedsMemoryReservation prometheusRule manages wrongly hugepage reservation

Summary: SystemExceedsMemoryReservation prometheusRule manages wrongly hugepage reserv...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Harshal Patil
QA Contact:	Weinan Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2028854 2056502
TreeView+	depends on / blocked

Reported:	2021-07-05 14:08 UTC by Alberto Losada
Modified:	2022-02-21 16:44 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-18 17:38:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 2661	0	None	open	Bug 1979297: Revert "Subtract hugepages from memory capacity and allocatables"	2021-07-06 06:41:18 UTC
Red Hat Bugzilla	1953846	1	unspecified	CLOSED	SystemMemoryExceedsReservation alert should consider hugepage reservation	2021-07-28 03:09:29 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:38:16 UTC

Description Alberto Losada 2021-07-05 14:08:45 UTC

Description of problem:

Prometheus rule for the well-known SystemExceedsMemoryReservation is not correct. Actually, I am getting a negative value when hugepages are configured. 

See here the query, which has been changed in 4.8: https://github.com/openshift/machine-config-operator/blob/release-4.8/install/0000_90_machine-config-operator_01_prometheus-rules.yaml#L54

sum by (node) (container_memory_rss{id="/system.slice"}) > ((sum by (node) (kube_node_status_capacity{resource="memory"}) - sum by (node) (kube_node_status_capacity{resource="hugepages_1Gi"}) - sum by (node) (kube_node_status_capacity{resource="hugepages_2Mi"}) - sum by (node) (kube_node_status_allocatable{resource="memory"}) - sum by (node)  (kube_node_status_allocatable{resource="hugepages_1Gi"}) - sum by (node)  (kube_node_status_allocatable{resource="hugepages_2Mi"})) * 0.9)


Taking into account my node resources:

Capacity:
  cpu:                                     80
  ephemeral-storage:                       3750202692Ki
  hugepages-1Gi:                           16Gi
  hugepages-2Mi:                           0
  management.workload.openshift.io/cores:  80k
  memory:                                  197707916Ki
Allocatable:
  cpu:                                     76
  ephemeral-storage:                       3456186795225
  hugepages-1Gi:                           16Gi
  hugepages-2Mi:                           0
  management.workload.openshift.io/cores:  80k
  memory:                                  179804300Ki


Taking a look at the query and the values from my node, you will notice that the right side of the query is removing twice the hugepage-1Gi value (16Gi) from allocatable and capacity.

Executing the query in Thanos querier (or Prometheus if you prefer) you get these value for the mentioned node:

* sum by (node) (container_memory_rss{id="/system.slice"}) 685.961.216
* sum by (node) (kube_node_status_capacity{resource="memory"}) 202.452.905.984
* sum by (node) (kube_node_status_allocatable{resource="memory"}) 184.119.603.200
* sum by (node) (kube_node_status_capacity{resource="hugepages_1Gi"}) 17.179.869.184
* sum by (node)  (kube_node_status_allocatable{resource="hugepages_1Gi"}) 17.179.869.184

Then, the result for the right side of the query is showing a negative result:

(sum by (node) (kube_node_status_capacity{resource="memory"}) - sum by (node) (kube_node_status_capacity{resource="hugepages_1Gi"}) - sum by (node) (kube_node_status_capacity{resource="hugepages_2Mi"}) - sum by (node) (kube_node_status_allocatable{resource="memory"}) - sum by (node)  (kube_node_status_allocatable{resource="hugepages_1Gi"}) - sum by (node)  (kube_node_status_allocatable{resource="hugepages_2Mi"})) = -16026435584

Which, eventually triggers always the alert.

Version-Release number of selected component (if applicable):


How reproducible:

Always if you configure hugepages

Steps to Reproduce:
1. Configure hugepages on your OpenShift cluster. In my case it is a SNO cluster that has been configured using the performance addon operator to reserve 16Gi of 1Gi hugepages
2. Execute the Prometheus query in Thanos querier or Prometheus console
3.

Actual results:

The alert is always triggered since the right side of the equation has a negative value when hugepages are configured

Expected results:


Additional info:

My tests were done on a SNO, however, the results must be the same on a regular OCP cluster.

Comment 8 errata-xmlrpc 2021-10-18 17:38:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.