Bug 1993218
Summary: | alerts: SystemMemoryExceedsReservation triggers too quickly | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
Component: | Node | Assignee: | Ryan Phillips <rphillips> |
Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | aos-bugs, cblecker, jaliang, lucian.maly, malonso, maupadhy, micmurph, oarribas, rphillips |
Version: | 4.6 | Keywords: | Reopened, ServiceDeliveryImpact |
Target Milestone: | --- | ||
Target Release: | 4.6.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-09-09 01:52:52 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1992687 | ||
Bug Blocks: |
Comment 1
Ryan Phillips
2021-08-12 19:23:36 UTC
These are independent fixes, the PR on this moves from immediate alerts to 15m threshold while that may not fix the problem overall it does address the issues as described in the bug so re-opening and moving back to ON_QA. We'll see new bugs coming down from the change from 90% to 95% in the future but that's likely to be weeks out. Checked on 4.6.0-0.nightly-2021-08-31-113011, the alert is updated. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-08-31-113011 True False 4h21m Cluster version is 4.6.0-0.nightly-2021-08-31-113011 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.44 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3395 I have a customer already in OpenShift 4.6.44, but still having this kind of issue. It's a new installed cluster, barely no customer's application yet. The node's memory is 128G total, the memory usage is about 10Gi during last 24hrs. But the SystemMemoryExceedsReservation alert triggered 7 times. Customer is currently on 4.6.45, but still observing this issue on one node: # free -g total used free shared buff/cache available Mem: 62 15 17 0 29 46 Swap: 0 0 0 # oc describe node Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 5592m (74%) 11350m (151%) memory 27442798464 (41%) 38308997376 (57%) # oc get --raw /api/v1/nodes/<NODE>/proxy/stats/summary | jq '.node.systemContainers[].memory.usageBytes' kubelet = 461,053,952 B runtime = 19,042,807,808 B misc = 24,289,464,320 B pods = 20,525,260,800 B |