Bug 1916501

Summary: systemReserved complains 90% of memory is used.
Product: OpenShift Container Platform Reporter: cshepher
Component: NodeAssignee: Harshal Patil <harpatil>
Node sub component: Kubelet QA Contact: MinLi <minmli>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, dkulkarn, harpatil, hsbawa, mleonard, rkshirsa, rphillips, rsandu, transient.sepia, tsweeney, yaoli
Version: 4.6Keywords: Reopened
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-09 20:38:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description cshepher 2021-01-14 22:35:49 UTC
Description of problem:
systemReserved continually warns of more and more memory being utilized daily.
"message = System memory usage of 9.361G on devocp4ec-nmthf-master-0 exceeds 90% of the reservation."
 
Top showed kubelet and etcd taking up most of the memory usage (9-12%), but 2 pprof data snapshots show they are only using about 40-60MB. Kernel team found no evidence of a memory leak. (See comment #46: https://gss--c.visualforce.com/apex/Case_View?id=5002K00000r2vBG&sfdc.override=1#comment_a0a2K00000YGdefQAD) Is this a spurious error?  Multiple nodes in an 8 node cluster are throwing this error and the customer is concerned.

Version-Release number of selected component (if applicable):
OCP 4.6.3 on Azure, RHCOS 8.3

Steps to Reproduce:
1.  Spin up cluster
2.  Get systemReserved error that node is using 90% of allocated CPU
3.  Increase allocated CPU to beyond previous number, everything OK for a day then another error saying it is using more than 90% again.

Comment 2 Martin Sivák 2021-01-15 09:33:24 UTC
Memory manager deals with hugepages, I believe this belongs to kubelet and conditions reporting.

Comment 4 cshepher 2021-01-15 22:57:29 UTC
The only thing I see in the etcd logs are some "took too long" warnings, mostly on master-2 but there are a couple on master-0 and master-1.

~~~
2020-11-25T19:24:12.239620218Z 2020-11-25 19:24:12.239586 W | etcdserver: request "header:<ID:8010344485033188228 username:\"etcd\" auth_revision:1 > txn:<compare:<target:MOD key:\"/kubernetes.io/monitoring.coreos.com/servicemonitors/openshift-logging/monitor-elasticsearch-cluster\" mod_revision:0 > success:<request_put:<key:\"/kubernetes.io/monitoring.coreos.com/servicemonitors/openshift-logging/monitor-elasticsearch-cluster\" value_size:1509 >> failure:<>>" with result "size:7" took too long (303.371753ms) to execute
2020-11-25T19:24:12.240115924Z 2020-11-25 19:24:12.240017 W | etcdserver: read-only range request "key:\"/kubernetes.io/cronjobs/openshift-logging/curator\" " with result "range_response_count:1 size:3881" took too long (315.956509ms) to execute
2020-11-25T19:24:12.240115924Z 2020-11-25 19:24:12.240036 W | etcdserver: read-only range request "key:\"/kubernetes.io/roles/openshift-kube-scheduler/system:openshift:sa-listing-configmaps\" " with result "range_response_count:1 size:434" took too long (347.237496ms) to execute
2020-11-25T19:24:12.240431828Z 2020-11-25 19:24:12.240395 I | etcdserver/api/etcdhttp: /health OK (status code 200)
2020-11-25T19:24:12.241419640Z 2020-11-25 19:24:12.241396 W | etcdserver: read-only range request "key:\"/kubernetes.io/operator.openshift.io/openshiftcontrollermanagers/cluster\" " with result "range_response_count:1 size:2635" took too long (288.345967ms) to execute
2020-11-25T19:24:12.241640943Z 2020-11-25 19:24:12.241615 W | etcdserver: read-only range request "key:\"/kubernetes.io/monitoring.coreos.com/servicemonitors/\" range_end:\"/kubernetes.io/monitoring.coreos.com/servicemonitors0\" count_only:true " with result "range_response_count:0 size:9" took too long (222.174448ms) to execute
2020-11-25T19:24:12.241900346Z 2020-11-25 19:24:12.241826 W | etcdserver: read-only range request "key:\"/kubernetes.io/ingress/\" range_end:\"/kubernetes.io/ingress0\" count_only:true " with result "range_response_count:0 size:7" took too long (239.421262ms) to execute
2020-11-25T19:24:17.021161848Z 2020-11-25 19:24:17.021107 I | etcdserver/api/etcdhttp: /health OK (status code 200)
~~~

Comment 5 Ryan Phillips 2021-01-18 16:18:46 UTC
With the way that Go works, the allocator will not release memory back to the OS until the system is under memory pressure [1]. We put this alert in so that we can see when this happens in production clusters. The memory allocator in Golang is going to change with Golang 1.16 (thus in future versions of Openshift).

We highly recommend upgrading to 4.6.9+ since that does include a kernel patch for high memory scenarios on cloud machines. 

1. https://github.com/golang/go/issues/42330

Comment 9 Ryan Phillips 2021-02-09 20:38:32 UTC
In 4.7 we are enabling an option to make crio and kubelet reclaim memory faster. I created a backport for 4.6 here: https://bugzilla.redhat.com/show_bug.cgi?id=1907929 

https://github.com/openshift/machine-config-operator/pull/2397

*** This bug has been marked as a duplicate of bug 1907929 ***

Comment 12 hsbawa 2021-08-17 00:43:47 UTC
I am using OCP 4.7.22 and getting similar error. Not sure what I may be missing. 

Aug 16, 2021, 8:22 PM
System memory usage of 1.347G on infra3.hsb.local exceeds 90% of the reservation. Reserved memory ensures system processes can function even when the node is fully allocated and protects against workload out of memory events impacting the proper functioning of the node. The reservation may be increased (https://docs.openshift.com/container-platform/latest/nodes/nodes/nodes-nodes-managing.html) when running nodes with high numbers of pods.
View details
Aug 16, 2021, 8:22 PM
System memory usage of 1.099G on infra1.hsb.local exceeds 90% of the reservation. Reserved memory ensures system processes can function even when the node is fully allocated and protects against workload out of memory events impacting the proper functioning of the node. The reservation may be increased (https://docs.openshift.com/container-platform/latest/nodes/nodes/nodes-nodes-managing.html) when running nodes with high numbers of pods.

Comment 13 transient.sepia 2021-08-19 06:51:03 UTC
Hitting the same error on both 4.6.z (4.6.40) and 4.8.z (4.8.3). Should this be looked at again?

Comment 15 Red Hat Bugzilla 2023-10-21 04:25:04 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days