Bug 1929278
Summary: | Monitoring workloads using too high a priorityclass | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ben Parees <bparees> | ||||
Component: | Monitoring | Assignee: | Lili Cosic <lcosic> | ||||
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.7 | CC: | alegrand, anpicker, aos-bugs, erooth, fdeutsch, hongyli, jokerman, juzhao, kakkoyun, lcosic, pelauter, pkrupa, surbania, wking | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.7.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1929277 | ||||||
: | 1929354 (view as bug list) | Environment: | |||||
Last Closed: | 2021-02-24 15:58:19 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1929277 | ||||||
Bug Blocks: | 1929354 | ||||||
Attachments: |
|
Description
Ben Parees
2021-02-16 15:57:53 UTC
Reassigning to the CVO team as we are blocked by lack of ability to change the existing value of the class. Moving this back to monitoring team. I will create a new bug against CVO for the fact that we have no mechanism to update(delete+recreate) priorityclasses today, but the monitoring team needs to deliver the fix that reduces the prom priority. Note that i don't think anyone knows the implications of deleting+recreating priorityclasses on existing workloads in those classes, so that path may not even be viable for the CVO to introduce. Monitoring team can fix this 4.7.0 blocker bug by one of: 1) just use the existing user class for cluster prom (would have same priority as UWM prom) 2) introduce a new user class with a lower priority, move UWM to that class, use the existing user class(or a new one) for cluster prom 3) getting the CVO to deliver a fix that allows priorityclass updates (per the bug i am about to create) 4) getting kube to carry a patch that allows for higher user defined priorityclass values (I may have missed some options, but those are the main ones i'm aware of) upgrade from 4.6.18 to 4.7.0-0.nightly-2021-02-17-224627, see the attached picture, prometheus pods consumed 3.27GiB memory at the most during the upgrade, and no nodes are changed to unhealthy status # oc -n openshift-monitoring get sts prometheus-k8s -oyaml | grep priorityClassName priorityClassName: openshift-user-critical Created attachment 1757790 [details]
prometheus-k8s pods memory usage
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |