Description of problem: the kubelet.conf sytemreserved memory defaults to 1G. The SystemMemroyExceedsResevration will trigger out of the box with many observed clusters under significant load. Version-Release number of selected component (if applicable): 4.6+ How reproducible: Reproducible with clusters under load. Steps to Reproduce: 1. Upgrade 4.5 --> 4.6+ 2. Enable monitoring 3. alert fires Actual results: Alert fires, systemreserved is exceeded (>90%) Expected results: increased default systemreserved would allow clusters under load to have more systemreserved Additional info: Observed on 6/9 clusters upgraded from 4.5->4.7. A custome machine config can be created but it seems the default OOB SystemReserved is low at 1G
Same error was seen on my customer environment as well. System memory usage of 1.073G on Node <redacted> exceeds 90% of the reservation. Reserved memory ensures system processes can function even when the node is fully allocated and protects against workload out of memory events impacting the proper functioning of the node. The reservation may be increased (https://docs.openshift.com/container-platform/latest/nodes/nodes/nodes-nodes-managing.html) when running nodes with high numbers of pods.
Please contact me directly if a list of recent cases opened for this matter if this would be a help. (query on 'System memory usage exceeds 90% of the reservation.' will provide all cases) pmoses
*** Bug 1993218 has been marked as a duplicate of this bug. ***
Verified on 4.9.0-0.nightly-2021-09-01-193941. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-09-01-193941 True False 125m Cluster version is 4.9.0-0.nightly-2021-09-01-193941
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759