Bug 2028854 - SystemMemoryExceedsReservation Alert
Summary: SystemMemoryExceedsReservation Alert
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.z
Assignee: Swarup Ghosh
QA Contact: Sunil Choudhary
URL:
Whiteboard:
: 2056502 2067292 (view as bug list)
Depends On: 1979297
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-03 15:06 UTC by Shubham Jadhav
Modified: 2022-03-30 15:09 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: System memory reservation alert using Prometheus QL was using hugepages memory consumption in account which was not required. Consequence: The alert was getting fired unnecessarily on the cluster for OCP 4.8. Fix: The fix was backported to 4.8 and already existent in later versions of OCP. Fix included removal of linux huge pages from the system memory calculation. Result: The unnecessary alerts should be fixed.
Clone Of:
Environment:
Last Closed: 2022-03-16 11:30:09 UTC
Target Upstream Version:
Embargoed:
swghosh: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2956 0 None open Bug 2028854: Backport of SystemMemoryExceedsReservation alert rule 2022-02-21 17:25:01 UTC
Red Hat Product Errata RHBA-2022:0795 0 None None None 2022-03-16 11:30:33 UTC

Description Shubham Jadhav 2021-12-03 15:06:22 UTC
Description of problem:

Customer is facing an issue with the SystemMemoryExceedsReservation alert on worker nodes.



How reproducible: Everytime


Actual results:

Kubelet consuming high memory around ~40GB.

Expected results:

SystemMemoryExceedsReservation alert after increasing the System Reserved Memory to 9GB should be gone


Additional info:

We increased the System Reserved Memory to 9GB as per the KCS[0] and documentation[1]. 

Even after increasing the System Reserved Memory, we found that the Kubelet on all the worker nodes is consuming high memory. 

~~~
[core@ocpnonprod-xxxx-worker-xxxx ~]$ top
top - 09:48:50 up 21:45,  1 user,  load average: 7.69, 7.79, 8.49
Tasks: 379 total,   1 running, 377 sleeping,   0 stopped,   1 zombie
%Cpu(s): 74.7 us, 16.5 sy,  0.2 ni,  7.8 id,  0.0 wa,  0.7 hi,  0.2 si,  0.0 st
MiB Mem : 128919.9 total,  68571.9 free,  45363.9 used,  14984.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  82294.4 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1976 root      20   0   41.5g  37.2g  71340 S 614.6  29.5   2000:12 kubelet

------------------------------------------------------------------------------

[root@ocpnonprod-xxxx-worker-xxxx /]# top
top - 09:49:47 up 22:19,  1 user,  load average: 15.75, 13.10, 14.09
Tasks: 1058 total,   7 running, 1048 sleeping,   0 stopped,   3 zombie
%Cpu(s): 64.6 us, 28.2 sy,  0.2 ni,  5.2 id,  0.0 wa,  1.0 hi,  0.8 si,  0.0 st
MiB Mem : 128919.9 total,  58637.4 free,  50217.3 used,  20065.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  78033.4 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1980 root      20   0   46.9g  41.8g  70724 S 522.0  33.2   1987:18 kubelet

------------------------------------------------------------------------------

[root@ocpnonprod-xxxx-worker-xxxx ~]# top
top - 09:50:27 up 21:51,  1 user,  load average: 8.48, 9.43, 11.14
Tasks: 507 total,   1 running, 505 sleeping,   0 stopped,   1 zombie
%Cpu(s):  7.7 us, 64.4 sy,  0.4 ni, 25.9 id,  0.0 wa,  1.1 hi,  0.5 si,  0.0 st
MiB Mem : 128919.9 total,  58985.9 free,  53438.4 used,  16495.5 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  74253.3 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1973 root      20   0   50.1g  43.9g  71068 S 198.0  34.9   2236:23 kubelet
~~~


[0] https://access.redhat.com/solutions/5843241
[1] https://docs.openshift.com/container-platform/4.8/nodes/nodes/nodes-nodes-resources-configuring.html#nodes-nodes-resources-configuring-auto_nodes-nodes-resources-configuring

Comment 3 Mridul Markandey 2021-12-10 08:43:28 UTC
Hello Team,

I have a customer who is facing a similar issue in his RHOCP v4.8.14 cluster. The customer is getting a "SystemMemoryExceedsReservation" warning on all the master and worker nodes of the cluster even well after configuring the reservation to 12G. The customer has shared a must-gather which I will share on this Bugzilla. Let me know if more information is needed from the customer's environment for further analysis.

Regards,
Mridul Markandey

Comment 52 Harshal Patil 2022-02-21 11:51:17 UTC
*** Bug 2056502 has been marked as a duplicate of this bug. ***

Comment 61 Sunil Choudhary 2022-03-10 07:31:39 UTC
Verified on 4.8.34

Comment 63 errata-xmlrpc 2022-03-16 11:30:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.34 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0795

Comment 64 Harshal Patil 2022-03-30 09:55:35 UTC
*** Bug 2067292 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.