Bug 1980844 - The SystemMemoryExceedsReserved alert released in 4.6 seems to trigger on many clusters under load (default increase if possible?)
Summary: The SystemMemoryExceedsReserved alert released in 4.6 seems to trigger on ma...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.9.0
Assignee: Harshal Patil
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks: 2000500
TreeView+ depends on / blocked
 
Reported: 2021-07-09 16:00 UTC by pmoses
Modified: 2021-10-18 17:39 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Alerts would fire at 90% utilization. Consequence: Fix: Alerts will now fire at 95% utilization. Result:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:39:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2716 0 None None None 2021-09-02 02:02:26 UTC
Github openshift machine-config-operator pull 2722 0 None None None 2021-09-02 02:02:10 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:39:14 UTC

Description pmoses 2021-07-09 16:00:39 UTC
Description of problem:
the kubelet.conf sytemreserved memory defaults to 1G. The SystemMemroyExceedsResevration will trigger out of the box with many observed clusters under significant load. 

Version-Release number of selected component (if applicable):
4.6+

How reproducible:
Reproducible with clusters under load. 

Steps to Reproduce:
1. Upgrade 4.5 --> 4.6+
2. Enable monitoring
3. alert fires

Actual results:
Alert fires, systemreserved is exceeded (>90%)

Expected results:
increased default systemreserved would allow clusters under load to have more systemreserved 

Additional info:
Observed on 6/9 clusters upgraded from 4.5->4.7. A custome machine config can be created but it seems the default OOB SystemReserved is low at 1G

Comment 1 sramanat 2021-07-09 16:15:48 UTC
Same error was seen on my customer environment as well. 

System memory usage of 1.073G on Node <redacted> exceeds 90% of the reservation. Reserved memory ensures system processes can function even when the node is fully allocated and protects against workload out of memory events impacting the proper functioning of the node. The reservation may be increased (https://docs.openshift.com/container-platform/latest/nodes/nodes/nodes-nodes-managing.html) when running nodes with high numbers of pods.

Comment 5 pmoses 2021-08-12 04:34:09 UTC
Please contact me directly if a list of recent cases opened for this matter if this would be a help. (query on 'System memory usage exceeds 90% of the reservation.' will provide all cases) pmoses

Comment 6 Ryan Phillips 2021-08-12 19:23:36 UTC
*** Bug 1993218 has been marked as a duplicate of this bug. ***

Comment 11 Sunil Choudhary 2021-09-02 08:09:37 UTC
Verified on 4.9.0-0.nightly-2021-09-01-193941.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-09-01-193941   True        False         125m    Cluster version is 4.9.0-0.nightly-2021-09-01-193941

Comment 14 errata-xmlrpc 2021-10-18 17:39:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.