Description of problem: Prometheus rule for the well-known SystemExceedsMemoryReservation is not correct. Actually, I am getting a negative value when hugepages are configured. See here the query, which has been changed in 4.8: https://github.com/openshift/machine-config-operator/blob/release-4.8/install/0000_90_machine-config-operator_01_prometheus-rules.yaml#L54 sum by (node) (container_memory_rss{id="/system.slice"}) > ((sum by (node) (kube_node_status_capacity{resource="memory"}) - sum by (node) (kube_node_status_capacity{resource="hugepages_1Gi"}) - sum by (node) (kube_node_status_capacity{resource="hugepages_2Mi"}) - sum by (node) (kube_node_status_allocatable{resource="memory"}) - sum by (node) (kube_node_status_allocatable{resource="hugepages_1Gi"}) - sum by (node) (kube_node_status_allocatable{resource="hugepages_2Mi"})) * 0.9) Taking into account my node resources: Capacity: cpu: 80 ephemeral-storage: 3750202692Ki hugepages-1Gi: 16Gi hugepages-2Mi: 0 management.workload.openshift.io/cores: 80k memory: 197707916Ki Allocatable: cpu: 76 ephemeral-storage: 3456186795225 hugepages-1Gi: 16Gi hugepages-2Mi: 0 management.workload.openshift.io/cores: 80k memory: 179804300Ki Taking a look at the query and the values from my node, you will notice that the right side of the query is removing twice the hugepage-1Gi value (16Gi) from allocatable and capacity. Executing the query in Thanos querier (or Prometheus if you prefer) you get these value for the mentioned node: * sum by (node) (container_memory_rss{id="/system.slice"}) 685.961.216 * sum by (node) (kube_node_status_capacity{resource="memory"}) 202.452.905.984 * sum by (node) (kube_node_status_allocatable{resource="memory"}) 184.119.603.200 * sum by (node) (kube_node_status_capacity{resource="hugepages_1Gi"}) 17.179.869.184 * sum by (node) (kube_node_status_allocatable{resource="hugepages_1Gi"}) 17.179.869.184  Then, the result for the right side of the query is showing a negative result: (sum by (node) (kube_node_status_capacity{resource="memory"}) - sum by (node) (kube_node_status_capacity{resource="hugepages_1Gi"}) - sum by (node) (kube_node_status_capacity{resource="hugepages_2Mi"}) - sum by (node) (kube_node_status_allocatable{resource="memory"}) - sum by (node) (kube_node_status_allocatable{resource="hugepages_1Gi"}) - sum by (node) (kube_node_status_allocatable{resource="hugepages_2Mi"})) = -16026435584 Which, eventually triggers always the alert. Version-Release number of selected component (if applicable): How reproducible: Always if you configure hugepages Steps to Reproduce: 1. Configure hugepages on your OpenShift cluster. In my case it is a SNO cluster that has been configured using the performance addon operator to reserve 16Gi of 1Gi hugepages 2. Execute the Prometheus query in Thanos querier or Prometheus console 3. Actual results: The alert is always triggered since the right side of the equation has a negative value when hugepages are configured Expected results: Additional info: My tests were done on a SNO, however, the results must be the same on a regular OCP cluster.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759