Bug 1993218

Summary: alerts: SystemMemoryExceedsReservation triggers too quickly
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NodeAssignee: Ryan Phillips <rphillips>
Node sub component: Kubelet QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, cblecker, jaliang, lucian.maly, malonso, maupadhy, micmurph, oarribas, rphillips
Version: 4.6Keywords: Reopened, ServiceDeliveryImpact
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-09 01:52:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1992687    
Bug Blocks:    

Comment 1 Ryan Phillips 2021-08-12 19:23:36 UTC

*** This bug has been marked as a duplicate of bug 1980844 ***

Comment 3 Scott Dodson 2021-08-18 13:24:44 UTC
These are independent fixes, the PR on this moves from immediate alerts to 15m threshold while that may not fix the problem overall it does address the issues as described in the bug so re-opening and moving back to ON_QA. We'll see new bugs coming down from the change from 90% to 95% in the future but that's likely to be weeks out.

Comment 6 Sunil Choudhary 2021-09-02 16:28:11 UTC
Checked on 4.6.0-0.nightly-2021-08-31-113011, the alert is updated.
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-08-31-113011   True        False         4h21m   Cluster version is 4.6.0-0.nightly-2021-08-31-113011

Comment 8 errata-xmlrpc 2021-09-09 01:52:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.44 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3395

Comment 10 Jace Liang 2021-11-19 03:11:56 UTC
I have a customer already in OpenShift 4.6.44, but still having this kind of issue.

It's a new installed cluster, barely no customer's application yet. 
The node's memory is 128G total, the memory usage is about 10Gi during last 24hrs. 
But the SystemMemoryExceedsReservation alert triggered 7 times.

Comment 11 Lucian Maly (Red Hat) 2021-11-26 02:42:37 UTC
Customer is currently on 4.6.45, but still observing this issue on one node:                                                                       

# free -g                                                                 
              total        used        free      shared  buff/cache   available 
Mem:             62          15          17           0          29          46 
Swap:             0           0           0                                     

# oc describe node
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource  Requests           Limits
  --------  --------           ------
  cpu       5592m (74%)        11350m (151%)
  memory    27442798464 (41%)  38308997376 (57%)

# oc get --raw /api/v1/nodes/<NODE>/proxy/stats/summary | jq '.node.systemContainers[].memory.usageBytes'
kubelet =    461,053,952 B
runtime = 19,042,807,808 B
misc    = 24,289,464,320 B
pods    = 20,525,260,800 B