Bug 1993218 - alerts: SystemMemoryExceedsReservation triggers too quickly
Summary: alerts: SystemMemoryExceedsReservation triggers too quickly
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.6.z
Assignee: Ryan Phillips
QA Contact: Sunil Choudhary
Depends On: 1992687
TreeView+ depends on / blocked
Reported: 2021-08-12 14:51 UTC by OpenShift BugZilla Robot
Modified: 2021-11-26 02:42 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2021-09-09 01:52:52 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5788171 0 None None None 2021-09-03 16:45:03 UTC
Red Hat Product Errata RHBA-2021:3395 0 None None None 2021-09-09 01:53:14 UTC

Comment 1 Ryan Phillips 2021-08-12 19:23:36 UTC

*** This bug has been marked as a duplicate of bug 1980844 ***

Comment 3 Scott Dodson 2021-08-18 13:24:44 UTC
These are independent fixes, the PR on this moves from immediate alerts to 15m threshold while that may not fix the problem overall it does address the issues as described in the bug so re-opening and moving back to ON_QA. We'll see new bugs coming down from the change from 90% to 95% in the future but that's likely to be weeks out.

Comment 6 Sunil Choudhary 2021-09-02 16:28:11 UTC
Checked on 4.6.0-0.nightly-2021-08-31-113011, the alert is updated.
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-08-31-113011   True        False         4h21m   Cluster version is 4.6.0-0.nightly-2021-08-31-113011

Comment 8 errata-xmlrpc 2021-09-09 01:52:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.44 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 10 Jace Liang 2021-11-19 03:11:56 UTC
I have a customer already in OpenShift 4.6.44, but still having this kind of issue.

It's a new installed cluster, barely no customer's application yet. 
The node's memory is 128G total, the memory usage is about 10Gi during last 24hrs. 
But the SystemMemoryExceedsReservation alert triggered 7 times.

Comment 11 Lucian Maly (Red Hat) 2021-11-26 02:42:37 UTC
Customer is currently on 4.6.45, but still observing this issue on one node:                                                                       

# free -g                                                                 
              total        used        free      shared  buff/cache   available 
Mem:             62          15          17           0          29          46 
Swap:             0           0           0                                     

# oc describe node
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource  Requests           Limits
  --------  --------           ------
  cpu       5592m (74%)        11350m (151%)
  memory    27442798464 (41%)  38308997376 (57%)

# oc get --raw /api/v1/nodes/<NODE>/proxy/stats/summary | jq '.node.systemContainers[].memory.usageBytes'
kubelet =    461,053,952 B
runtime = 19,042,807,808 B
misc    = 24,289,464,320 B
pods    = 20,525,260,800 B

Note You need to log in before you can comment on or make changes to this bug.