Bug 1576547
Summary: | kube-state-metrics pods getting OoM killed @ 750 nodes | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jiří Mencák <jmencak> | ||||
Component: | Monitoring | Assignee: | Frederic Branczyk <fbranczy> | ||||
Status: | CLOSED ERRATA | QA Contact: | Mike Fiedler <mifiedle> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.10.0 | CC: | aos-bugs, dmace, lcosic, mifiedle, spasquie, wsun | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.2.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | aos-scalability-310 | ||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-10-16 06:27:40 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jiří Mencák
2018-05-09 17:26:52 UTC
We did some pretty massive improvements that landed in 4.0, hopefully that fixes all of these. At the end of the day we will still use a lot of memory as that's the purpose of kube-state-metrics (being a cache that can be read from super fast). In future versions we will also look into sharding kube-state-metrics, but for now the improvements we have made have shown very significant improvements that should at least raise this bar a lot. Please re-test with OpenShift 4.0. @Mike Could you help to test this aos-scalability bug Marking verified on 4.2. There won't be another 750+ node cluster run until post-4.2 and a new bz can be opened then if there is an issue. In a 250 node cluster on GCP, kube-state-metrics is using 278MB VSZ and 172MB RSS Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |