Bug 1955044

Summary: SystemMemoryExceedsReservation alert calculating incorrect system-reserved when hugepages reserved memory is configured.
Product: OpenShift Container Platform Reporter: Sanket N <snalawad>
Component: MonitoringAssignee: Sergiusz Urbaniak <surbania>
Status: CLOSED DUPLICATE QA Contact: Junqi Zhao <juzhao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.6CC: alegrand, anpicker, erooth, kakkoyun, lcosic, pkrupa, spasquie, surbania
Target Milestone: ---Flags: snalawad: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-29 12:46:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sanket N 2021-04-29 10:45:56 UTC
Description of problem:

SystemMemoryExceedsReservation alert calculating incorrect system-reserved due to hugepages memory negation from allocatable memory.


~~~
SystemMemoryExceedsReservation alert:

sum by (node)\
    \ (container_memory_rss{id=\"/system.slice\"}) > ((sum by (node) (kube_node_status_capacity{resource=\"\
    memory\"} - kube_node_status_allocatable{resource=\"memory\"})) * 0.9)
~~~


The alert for SystemMemoryExceedsReservation is monitoring container_memory_rss{id=\"/system.slice\"}) and the condition is to be satisfied when the system memory exceeds 90% of system reserved memory.


THe system reserved memory is calculated by the expression :
~~~
((sum by (node) (kube_node_status_capacity{resource=\"\memory\"} - kube_node_status_allocatable{resource=\"memory\"}))   
~~~


System Reserved =  Capacity.memory - Allocatable.memory 


When hugepages are configured the allocatable.memory is neageted by the hugepage space and the system reserved will capture this neageted value for Allocatable.memory 


----------------------------------------------------------------------------------------
Without Hugepages                            |  After Hugepages configured
---------------------------------------------|-------------------------------------------
Capacity:                                    |  Capacity:
    cpu:                4                    |    cpu:                4
    ephemeral-storage:  41407468Ki           |    ephemeral-storage:  41407468Ki
    hugepages-1Gi:      0                    |    hugepages-1Gi:      0
    hugepages-2Mi:      0                    |    hugepages-2Mi:      100Mi                                                                                                       
    memory:             8153272Ki            |    memory:             8153256Ki                                                                                                   
    pods:               250                  |    pods:               250
  Allocatable:                               |  Allocatable:
    cpu:                3500m                |    cpu:                3500m
    ephemeral-storage:  37087380622          |    ephemeral-storage:  37087380622
    hugepages-1Gi:      0                    |    hugepages-1Gi:      0
    hugepages-2Mi:      0                    |    hugepages-2Mi:      100Mi                                                                                                       
    memory:             7002296Ki            |    memory:             6899880Ki      <===                                                                                                
    pods:               250                  |    pods:               250
----------------------------------------------------------------------------------------


Due to this the system reserved value in the alert will be greater than what we have configured for the cluster and this will affect the alert to miss the 90% mark of the system-reserved.






Expected results:


The following should be consider for the alert equation to compensate the hugepages negation so system reserve value would be unaffected.



\ (container_memory_rss{id=\"/system.slice\"}) > ( Capacity.memory -(Allocatable.memory + hugepages-1Gi + hugepages-2Mi))*90

Comment 3 Sanket N 2021-04-29 12:46:49 UTC

*** This bug has been marked as a duplicate of bug 1953846 ***