*** This bug has been marked as a duplicate of bug 1980844 ***
These are independent fixes, the PR on this moves from immediate alerts to 15m threshold while that may not fix the problem overall it does address the issues as described in the bug so re-opening and moving back to ON_QA. We'll see new bugs coming down from the change from 90% to 95% in the future but that's likely to be weeks out.
Checked on 4.6.0-0.nightly-2021-08-31-113011, the alert is updated.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.0-0.nightly-2021-08-31-113011 True False 4h21m Cluster version is 4.6.0-0.nightly-2021-08-31-113011
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.6.44 bug fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
I have a customer already in OpenShift 4.6.44, but still having this kind of issue.
It's a new installed cluster, barely no customer's application yet.
The node's memory is 128G total, the memory usage is about 10Gi during last 24hrs.
But the SystemMemoryExceedsReservation alert triggered 7 times.
Customer is currently on 4.6.45, but still observing this issue on one node:
# free -g
total used free shared buff/cache available
Mem: 62 15 17 0 29 46
Swap: 0 0 0
# oc describe node
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 5592m (74%) 11350m (151%)
memory 27442798464 (41%) 38308997376 (57%)
# oc get --raw /api/v1/nodes/<NODE>/proxy/stats/summary | jq '.node.systemContainers.memory.usageBytes'
kubelet = 461,053,952 B
runtime = 19,042,807,808 B
misc = 24,289,464,320 B
pods = 20,525,260,800 B