1624253 – [Upgrade] Infrastructure pods should be system-cluster/node-critical priorityclass after upgrade

Bug 1624253 - [Upgrade] Infrastructure pods should be system-cluster/node-critical priorityclass after upgrade

Summary: [Upgrade] Infrastructure pods should be system-cluster/node-critical priority...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Frederic Branczyk
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-08-31 06:34 UTC by weiwei jiang
Modified:	2019-06-04 10:40 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-04 10:40:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:40:41 UTC

Description weiwei jiang 2018-08-31 06:34:14 UTC

Description of problem:
After upgrade from 3.10 to 3.11, found that infrastructure pods have no expected priorityclass.

Means should follow this PR or later PR: https://github.com/openshift/openshift-ansible/pull/9166

Version-Release number of the following components:
rpm -q openshift-ansible
openshift-ansible-3.11.0-0.25.0.git.0.7497e69.el7.noarch
rpm -q ansible
ansible-2.6.3-1.el7ae.noarch
ansible --version

How reproducible:
Always

Steps to Reproduce:
1. Setup 3.10 cluster
2. Upgrade to 3.11 version
3. Check the priorityclass of infrastructure pods 
oc get pods --all-namespaces -o yaml |grep -i priority

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated
3. Some infrastructure pods are not system-cluster/node-critical priorityclass after upgrade


Expected results:
3. All infrastructure pods  system-cluster/node-critical priorityclass after upgrade

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 weiwei jiang 2018-08-31 06:44:51 UTC

And also this PR:  "Control plane static pods (apiserver, etcd, controller-manager) must be assigned highest priority class system-node-critical."[https://github.com/openshift/openshift-ansible/pull/9801]

Comment 3 Avesh Agarwal 2018-08-31 15:56:30 UTC

(In reply to weiwei jiang from comment #1)
> And also this PR:  "Control plane static pods (apiserver, etcd,
> controller-manager) must be assigned highest priority class
> system-node-critical."[https://github.com/openshift/openshift-ansible/pull/
> 9801]

Weiwei, did you test with the above PR? My guess is that above PR should be enough. I am also going to try upgrade by myself now as I am not very clear how upgrade from 3.10 to 3.11 is working exactly.

Comment 4 Avesh Agarwal 2018-08-31 16:01:15 UTC

I think i misread the issue. So what pods are not getting right priority? could you list them?

Comment 5 Avesh Agarwal 2018-08-31 19:20:21 UTC

I can reproduce that there are several pods in various namespaces like openshift-monitoring, openshift-console and openshift-web-console are not assigned right priority class. I am working on a PR that I will send after some testing. There are some operators (monitoring and prometheus) also involved so need to check what pods they are starting and how they pass configuration to their pods.

Comment 6 Avesh Agarwal 2018-09-10 16:59:27 UTC

Here are PRs:

https://github.com/openshift/openshift-ansible/pull/9981
https://github.com/openshift/cluster-monitoring-operator/pull/97
https://github.com/coreos/prometheus-operator/pull/1875

When above PRs are merged, there will be more follow-up PRs too.

Comment 7 Frederic Branczyk 2018-09-11 13:45:21 UTC

All pods except Prometheus and Alertmanager have the priority class set now, Jessica and I decided this is not a release blocker and therefore moving to 3.11.z.

Comment 10 Frederic Branczyk 2019-04-08 09:39:43 UTC

To fully resolve this, we need a newer version of the Prometheus Operator that introduces a lot of changes compared to what we're shipping in 3.11 right now, so we're only fixing this in 4.1.

The PR to fix the final pieces in 4.1: https://github.com/openshift/cluster-monitoring-operator/pull/311

Comment 16 Frederic Branczyk 2019-04-11 12:03:48 UTC

https://github.com/openshift/cluster-monitoring-operator/pull/311 just landed so all monitoring components should now have the appropriate priority class.

Comment 18 Junqi Zhao 2019-04-12 01:53:39 UTC

There is not available payload which packages the fix to test, so postpone the testing until we have available payload

Comment 21 errata-xmlrpc 2019-06-04 10:40:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.