2028854 – SystemMemoryExceedsReservation Alert

Bug 2028854 - SystemMemoryExceedsReservation Alert

Summary: SystemMemoryExceedsReservation Alert

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.8.z
Assignee:	Swarup Ghosh
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	2056502 2067292 (view as bug list)
Depends On:	1979297
Blocks:
TreeView+	depends on / blocked

Reported:	2021-12-03 15:06 UTC by Shubham Jadhav
Modified:	2022-03-30 15:09 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: System memory reservation alert using Prometheus QL was using hugepages memory consumption in account which was not required. Consequence: The alert was getting fired unnecessarily on the cluster for OCP 4.8. Fix: The fix was backported to 4.8 and already existent in later versions of OCP. Fix included removal of linux huge pages from the system memory calculation. Result: The unnecessary alerts should be fixed.
Clone Of:
Environment:
Last Closed:	2022-03-16 11:30:09 UTC
Target Upstream Version:
Embargoed:
Flags:	swghosh: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 2956	0	None	open	Bug 2028854: Backport of SystemMemoryExceedsReservation alert rule	2022-02-21 17:25:01 UTC
Red Hat Product Errata	RHBA-2022:0795	0	None	None	None	2022-03-16 11:30:33 UTC

Description Shubham Jadhav 2021-12-03 15:06:22 UTC

Description of problem:

Customer is facing an issue with the SystemMemoryExceedsReservation alert on worker nodes.



How reproducible: Everytime


Actual results:

Kubelet consuming high memory around ~40GB.

Expected results:

SystemMemoryExceedsReservation alert after increasing the System Reserved Memory to 9GB should be gone


Additional info:

We increased the System Reserved Memory to 9GB as per the KCS[0] and documentation[1]. 

Even after increasing the System Reserved Memory, we found that the Kubelet on all the worker nodes is consuming high memory. 

~~~
[core@ocpnonprod-xxxx-worker-xxxx ~]$ top
top - 09:48:50 up 21:45,  1 user,  load average: 7.69, 7.79, 8.49
Tasks: 379 total,   1 running, 377 sleeping,   0 stopped,   1 zombie
%Cpu(s): 74.7 us, 16.5 sy,  0.2 ni,  7.8 id,  0.0 wa,  0.7 hi,  0.2 si,  0.0 st
MiB Mem : 128919.9 total,  68571.9 free,  45363.9 used,  14984.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  82294.4 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1976 root      20   0   41.5g  37.2g  71340 S 614.6  29.5   2000:12 kubelet

------------------------------------------------------------------------------

[root@ocpnonprod-xxxx-worker-xxxx /]# top
top - 09:49:47 up 22:19,  1 user,  load average: 15.75, 13.10, 14.09
Tasks: 1058 total,   7 running, 1048 sleeping,   0 stopped,   3 zombie
%Cpu(s): 64.6 us, 28.2 sy,  0.2 ni,  5.2 id,  0.0 wa,  1.0 hi,  0.8 si,  0.0 st
MiB Mem : 128919.9 total,  58637.4 free,  50217.3 used,  20065.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  78033.4 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1980 root      20   0   46.9g  41.8g  70724 S 522.0  33.2   1987:18 kubelet

------------------------------------------------------------------------------

[root@ocpnonprod-xxxx-worker-xxxx ~]# top
top - 09:50:27 up 21:51,  1 user,  load average: 8.48, 9.43, 11.14
Tasks: 507 total,   1 running, 505 sleeping,   0 stopped,   1 zombie
%Cpu(s):  7.7 us, 64.4 sy,  0.4 ni, 25.9 id,  0.0 wa,  1.1 hi,  0.5 si,  0.0 st
MiB Mem : 128919.9 total,  58985.9 free,  53438.4 used,  16495.5 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  74253.3 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1973 root      20   0   50.1g  43.9g  71068 S 198.0  34.9   2236:23 kubelet
~~~


[0] https://access.redhat.com/solutions/5843241
[1] https://docs.openshift.com/container-platform/4.8/nodes/nodes/nodes-nodes-resources-configuring.html#nodes-nodes-resources-configuring-auto_nodes-nodes-resources-configuring

Comment 3 Mridul Markandey 2021-12-10 08:43:28 UTC

Hello Team,

I have a customer who is facing a similar issue in his RHOCP v4.8.14 cluster. The customer is getting a "SystemMemoryExceedsReservation" warning on all the master and worker nodes of the cluster even well after configuring the reservation to 12G. The customer has shared a must-gather which I will share on this Bugzilla. Let me know if more information is needed from the customer's environment for further analysis.

Regards,
Mridul Markandey

Comment 52 Harshal Patil 2022-02-21 11:51:17 UTC

*** Bug 2056502 has been marked as a duplicate of this bug. ***

Comment 61 Sunil Choudhary 2022-03-10 07:31:39 UTC

Verified on 4.8.34

Comment 63 errata-xmlrpc 2022-03-16 11:30:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.34 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0795

Comment 64 Harshal Patil 2022-03-30 09:55:35 UTC

*** Bug 2067292 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.