Bug 1624253
Summary: | [Upgrade] Infrastructure pods should be system-cluster/node-critical priorityclass after upgrade | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | weiwei jiang <wjiang> |
Component: | Monitoring | Assignee: | Frederic Branczyk <fbranczy> |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.11.0 | CC: | anpicker, aos-bugs, erooth, fbranczy, jokerman, mloibl, mmccomas, pkrupa, schoudha, sjenning, surbania, wsun |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:40:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
weiwei jiang
2018-08-31 06:34:14 UTC
And also this PR: "Control plane static pods (apiserver, etcd, controller-manager) must be assigned highest priority class system-node-critical."[https://github.com/openshift/openshift-ansible/pull/9801] (In reply to weiwei jiang from comment #1) > And also this PR: "Control plane static pods (apiserver, etcd, > controller-manager) must be assigned highest priority class > system-node-critical."[https://github.com/openshift/openshift-ansible/pull/ > 9801] Weiwei, did you test with the above PR? My guess is that above PR should be enough. I am also going to try upgrade by myself now as I am not very clear how upgrade from 3.10 to 3.11 is working exactly. I think i misread the issue. So what pods are not getting right priority? could you list them? I can reproduce that there are several pods in various namespaces like openshift-monitoring, openshift-console and openshift-web-console are not assigned right priority class. I am working on a PR that I will send after some testing. There are some operators (monitoring and prometheus) also involved so need to check what pods they are starting and how they pass configuration to their pods. Here are PRs: https://github.com/openshift/openshift-ansible/pull/9981 https://github.com/openshift/cluster-monitoring-operator/pull/97 https://github.com/coreos/prometheus-operator/pull/1875 When above PRs are merged, there will be more follow-up PRs too. All pods except Prometheus and Alertmanager have the priority class set now, Jessica and I decided this is not a release blocker and therefore moving to 3.11.z. To fully resolve this, we need a newer version of the Prometheus Operator that introduces a lot of changes compared to what we're shipping in 3.11 right now, so we're only fixing this in 4.1. The PR to fix the final pieces in 4.1: https://github.com/openshift/cluster-monitoring-operator/pull/311 https://github.com/openshift/cluster-monitoring-operator/pull/311 just landed so all monitoring components should now have the appropriate priority class. There is not available payload which packages the fix to test, so postpone the testing until we have available payload Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |