Bug 1929741 - CVO does not allow priorityclass updates
Summary: CVO does not allow priorityclass updates
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Lalatendu Mohanty
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-17 14:42 UTC by Ben Parees
Modified: 2021-06-03 15:47 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-03 15:47:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubernetes kubernetes issues 99205 0 None open Allow priority class values to be updated. 2021-02-24 18:58:54 UTC

Description Ben Parees 2021-02-17 14:42:00 UTC
PriorityClasses can't be mutated, so it's currently impossible to change the priority value of an existing class that is maintained by CVO (the update is rejected and ultimately causes the upgrade to fail).

Probably needs discussion w/ kube team, but one option would be to delete+recreate the resource instead of updating it.  But we need to understand what implication that has for existing workloads that were using the priorityclass when it gets deleted+recreated.

Honestly the CVO answer here may be that it's not solveable, depending on what the kube team says, but investigation is warranted.

Comment 1 Jack Ottofaro 2021-02-17 14:50:55 UTC
Failure here https://github.com/openshift/cluster-monitoring-operator/pull/1060.

"Could not update priorityclass "openshift-user-critical" (325 of 669): the object is invalid, possibly due to local cluster configuration"

Comment 2 W. Trevor King 2021-02-17 18:25:51 UTC
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-monitoring-operator/1060/pull-ci-openshift-cluster-monitoring-operator-release-4.7-e2e-agnostic-upgrade/1361797567247028224/artifacts/e2e-agnostic-upgrade/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-58d6f9db87-rmc6k_cluster-version-operator.log | grep -o 'error running apply for priorityclass.*' | sort | uniq -c
    242 error running apply for priorityclass "openshift-user-critical" (325 of 669): PriorityClass.scheduling.k8s.io "openshift-user-critical" is invalid: Value: Forbidden: may not be changed in an update.
      1 error running apply for priorityclass "openshift-user-critical" (325 of 669): context deadline exceeded

So the issue is that bumping the 'value' [1] of the PriorityClass manifest didn't work.  Maybe we can delete the PriorityClass object and create a new one instead of attempting to update the existing resource?  But I don't know how the Kube-core components would handle us deleting a PriorityClass being used by running pods.  We need to round with the Kube-core folks on this.

More broadly, it might be possible to get the CVO out of the PriorityClass management loop and have a Kube-core team in charge of creating and managing the classes needed by OpenShift.  We can hash that out as well when rounding with the Kube-core teams.

This is complicated enough that it's unlikely to happen this sprint.

[1]: https://github.com/openshift/cluster-monitoring-operator/pull/1060/files#diff-e30e0ecd0267865f4d29cdc205349e93e870c4d5c334f4a1c71d2bf5ef270298L8

Comment 3 W. Trevor King 2021-02-24 18:58:55 UTC
[1] is the upstream request to allow these to be updated.

[1]: https://github.com/kubernetes/kubernetes/issues/99205

Comment 4 W. Trevor King 2021-06-03 15:47:53 UTC
As pointed out in comment 0, the CVO could delete+recreate here to work around the current API-server behavior.  But as comment 3 points out, adjusting the API-server to support the CVO's current update attempts is being worked on upstream.  With bug 1934516 being resolved via a new priority class, so I'm not aware of anything that makes this important enough to be worth delete+recreate in the CVO code.  Marking DEFERRED, and we'll work for this use-case automatically after the API-server is adjusted.


Note You need to log in before you can comment on or make changes to this bug.