Bug 1955044 - SystemMemoryExceedsReservation alert calculating incorrect system-reserved when hugepages reserved memory is configured.
Summary: SystemMemoryExceedsReservation alert calculating incorrect system-reserved wh...
Keywords:
Status: CLOSED DUPLICATE of bug 1953846
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Sergiusz Urbaniak
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-29 10:45 UTC by Sanket N
Modified: 2021-05-04 04:06 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-29 12:46:49 UTC
Target Upstream Version:
Embargoed:
snalawad: needinfo-


Attachments (Terms of Use)

Description Sanket N 2021-04-29 10:45:56 UTC
Description of problem:

SystemMemoryExceedsReservation alert calculating incorrect system-reserved due to hugepages memory negation from allocatable memory.


~~~
SystemMemoryExceedsReservation alert:

sum by (node)\
    \ (container_memory_rss{id=\"/system.slice\"}) > ((sum by (node) (kube_node_status_capacity{resource=\"\
    memory\"} - kube_node_status_allocatable{resource=\"memory\"})) * 0.9)
~~~


The alert for SystemMemoryExceedsReservation is monitoring container_memory_rss{id=\"/system.slice\"}) and the condition is to be satisfied when the system memory exceeds 90% of system reserved memory.


THe system reserved memory is calculated by the expression :
~~~
((sum by (node) (kube_node_status_capacity{resource=\"\memory\"} - kube_node_status_allocatable{resource=\"memory\"}))   
~~~


System Reserved =  Capacity.memory - Allocatable.memory 


When hugepages are configured the allocatable.memory is neageted by the hugepage space and the system reserved will capture this neageted value for Allocatable.memory 


----------------------------------------------------------------------------------------
Without Hugepages                            |  After Hugepages configured
---------------------------------------------|-------------------------------------------
Capacity:                                    |  Capacity:
    cpu:                4                    |    cpu:                4
    ephemeral-storage:  41407468Ki           |    ephemeral-storage:  41407468Ki
    hugepages-1Gi:      0                    |    hugepages-1Gi:      0
    hugepages-2Mi:      0                    |    hugepages-2Mi:      100Mi                                                                                                       
    memory:             8153272Ki            |    memory:             8153256Ki                                                                                                   
    pods:               250                  |    pods:               250
  Allocatable:                               |  Allocatable:
    cpu:                3500m                |    cpu:                3500m
    ephemeral-storage:  37087380622          |    ephemeral-storage:  37087380622
    hugepages-1Gi:      0                    |    hugepages-1Gi:      0
    hugepages-2Mi:      0                    |    hugepages-2Mi:      100Mi                                                                                                       
    memory:             7002296Ki            |    memory:             6899880Ki      <===                                                                                                
    pods:               250                  |    pods:               250
----------------------------------------------------------------------------------------


Due to this the system reserved value in the alert will be greater than what we have configured for the cluster and this will affect the alert to miss the 90% mark of the system-reserved.






Expected results:


The following should be consider for the alert equation to compensate the hugepages negation so system reserve value would be unaffected.



\ (container_memory_rss{id=\"/system.slice\"}) > ( Capacity.memory -(Allocatable.memory + hugepages-1Gi + hugepages-2Mi))*90

Comment 3 Sanket N 2021-04-29 12:46:49 UTC

*** This bug has been marked as a duplicate of bug 1953846 ***


Note You need to log in before you can comment on or make changes to this bug.