+++ This bug was initially created as a clone of Bug #1891107 +++ +++ This bug was initially created as a clone of Bug #1891106 +++ priority & fairness: Increase the concurrency share of workload-low priority level carry upstream PR: https://github.com/kubernetes/kubernetes/pull/95259 All workloads running using service account (except for the ones distinguished by p&f with a logically higher matching precedence) will match the `service-accounts` flow schema and be assigned to the `workload-low` priority and thus will have only `20` concurrency shares. (~10% of the total) On the other hand, `global-default` flow schema is assigned to `global-default` priority configuration and thus will have `100` concurrency shares (~50% of the total). If I am not mistaken, `global-default` goes pretty much unused since workloads running with user (not service account) will fall into this category and is not very common. Workload with service accounts do not have enough concurrency share and may starve. Increase the concurrency share of `workload-low` from `20` to `100` and reduce that of `global-default` from `100` to `20`. We have been asking customer to apply the patch manually: https://bugzilla.redhat.com/show_bug.cgi?id=1883589#c56 > oc patch prioritylevelconfiguration workload-low --type=merge -p '{"spec":{"limited":{"assuredConcurrencyShares": 100}}}' > oc patch prioritylevelconfiguration global-default --type=merge -p '{"spec":{"limited":{"assuredConcurrencyShares": 20}}}' This will get rid of the need for manual patch.
Upstream PR: https://github.com/kubernetes/kubernetes/pull/95259. When we start the rebase work it will be pulled in. If we want it earlier we can carry it.
not applicable to 4.7, upstream already has this fix. we need to validate this for 4.6 and 4.5.
reopening it. once 4.7 rebase is done it can be verified by qe.
changing priority and severity to high. the customer can apply a yaml patch to work around
1.20 rebase has merged, moving to ON_QA
Checked with the fresh OCP4.7 install, $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-01-05-220959 True False 82m Cluster version is 4.7.0-0.nightly-2021-01-05-220959 $ oc get FlowSchema NAME PRIORITYLEVEL MATCHINGPRECEDENCE DISTINGUISHERMETHOD AGE MISSINGPL exempt exempt 1 <none> 117m False openshift-apiserver-sar exempt 2 ByUser 98m False openshift-oauth-apiserver-sar exempt 2 ByUser 84m False system-leader-election leader-election 100 ByUser 117m False workload-leader-election leader-election 200 ByUser 117m False openshift-ovn-kubernetes system 500 ByUser 108m False system-nodes system 500 ByUser 117m False kube-controller-manager workload-high 800 ByNamespace 117m False kube-scheduler workload-high 800 ByNamespace 117m False kube-system-service-accounts workload-high 900 ByNamespace 117m False openshift-apiserver workload-high 1000 ByUser 98m False openshift-controller-manager workload-high 1000 ByUser 116m False openshift-oauth-apiserver workload-high 1000 ByUser 84m False openshift-oauth-server workload-high 1000 ByUser 84m False openshift-apiserver-operator openshift-control-plane-operators 2000 ByUser 98m False openshift-authentication-operator openshift-control-plane-operators 2000 ByUser 84m False openshift-etcd-operator openshift-control-plane-operators 2000 ByUser 103m False openshift-kube-apiserver-operator openshift-control-plane-operators 2000 ByUser 98m False openshift-monitoring-metrics workload-high 2000 ByUser 100m False service-accounts workload-low 9000 ByUser 117m False global-default global-default 9900 ByUser 117m False catch-all catch-all 10000 ByUser 117m False $ oc get prioritylevelconfiguration workload-low -o jsonpath='{.spec.limited.assuredConcurrencyShares}' 100 $ oc get prioritylevelconfiguration global-default -o jsonpath='{.spec.limited.assuredConcurrencyShares}' 20 All are as expected.
Checked with one OCP4.7 upgrading from 4.6.9, $ oc get clusterversion -o json|jq ".items[0].status.history" [ { "completionTime": "2021-01-06T07:57:24Z", "image": "quay.io/openshift-release-dev/ocp-release@sha256:bf3a3f8bb1690ba1fa33e0a75437f377dde00a3dd19fe0398573f6bed48cfe04", "startedTime": "2021-01-06T06:47:37Z", "state": "Completed", "verified": false, "version": "4.7.0-fc.1" }, { "completionTime": "2021-01-06T04:59:04Z", "image": "quay.io/openshift-release-dev/ocp-release@sha256:43d5c84169a4b3ff307c29d7374f6d69a707de15e9fa90ad352b432f77c0cead", "startedTime": "2021-01-06T04:26:32Z", "state": "Completed", "verified": false, "version": "4.6.9" } ] $ oc get FlowSchema NAME PRIORITYLEVEL MATCHINGPRECEDENCE DISTINGUISHERMETHOD AGE MISSINGPL exempt exempt 1 <none> 3h46m False openshift-apiserver-sar exempt 2 ByUser 3h41m False openshift-oauth-apiserver-sar exempt 2 ByUser 3h41m False system-leader-election leader-election 100 ByUser 3h46m False workload-leader-election leader-election 200 ByUser 3h46m False openshift-sdn system 500 ByUser 43m False system-nodes system 500 ByUser 3h46m False kube-controller-manager workload-high 800 ByNamespace 3h46m False kube-scheduler workload-high 800 ByNamespace 3h46m False kube-system-service-accounts workload-high 900 ByNamespace 3h46m False openshift-apiserver workload-high 1000 ByUser 3h41m False openshift-controller-manager workload-high 1000 ByUser 3h41m False openshift-oauth-apiserver workload-high 1000 ByUser 3h41m False openshift-oauth-server workload-high 1000 ByUser 3h41m False openshift-apiserver-operator openshift-control-plane-operators 2000 ByUser 3h41m False openshift-authentication-operator openshift-control-plane-operators 2000 ByUser 3h41m False openshift-etcd-operator openshift-control-plane-operators 2000 ByUser 3h41m False openshift-kube-apiserver-operator openshift-control-plane-operators 2000 ByUser 3h41m False openshift-monitoring-metrics workload-high 2000 ByUser 3h41m False service-accounts workload-low 9000 ByUser 3h46m False global-default global-default 9900 ByUser 3h46m False catch-all catch-all 10000 ByUser 3h46m False $ oc get prioritylevelconfiguration workload-low -o jsonpath='{.spec.limited.assuredConcurrencyShares}' 20 $ oc get prioritylevelconfiguration global-default -o jsonpath='{.spec.limited.assuredConcurrencyShares}' 100 After upgraded to 4.7, the prioritylevelconfiguration value of workload-low and global-default were not changed as expected values, so assigned the bug back.
kewang, p&f post startup hook just does a create - so if the p&f object already exists it returns and does not care to issue an update. I am going to fix it in upstream.
akashem, If so, is there another PR about this and do I need to wait for the new PR to be ready?
kewang, so we discussed this upstream and this behavior is by design - the apiserver does not change the suggested priority configuration objects once it's created. That's why we don't see the changes taking affect. We are changing this and a fix will soon land - https://github.com/kubernetes/kubernetes/pull/98028. But for new install it works as you have verified. Can you move this BZ to verified with fresh install working and open a new BZ for upgrade?
Checked the fresh OCP4.7 install with latest payload, $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-02-09-024347 True False 70m Cluster version is 4.7.0-0.nightly-2021-02-09-024347 $ oc get FlowSchema NAME PRIORITYLEVEL MATCHINGPRECEDENCE DISTINGUISHERMETHOD AGE MISSINGPL exempt exempt 1 <none> 106m False openshift-apiserver-sar exempt 2 ByUser 88m False openshift-oauth-apiserver-sar exempt 2 ByUser 71m False system-leader-election leader-election 100 ByUser 106m False workload-leader-election leader-election 200 ByUser 106m False openshift-ovn-kubernetes system 500 ByUser 98m False system-nodes system 500 ByUser 106m False kube-controller-manager workload-high 800 ByNamespace 106m False kube-scheduler workload-high 800 ByNamespace 106m False kube-system-service-accounts workload-high 900 ByNamespace 106m False openshift-apiserver workload-high 1000 ByUser 88m False openshift-controller-manager workload-high 1000 ByUser 105m False openshift-oauth-apiserver workload-high 1000 ByUser 71m False openshift-oauth-server workload-high 1000 ByUser 70m False openshift-apiserver-operator openshift-control-plane-operators 2000 ByUser 88m False openshift-authentication-operator openshift-control-plane-operators 2000 ByUser 70m False openshift-etcd-operator openshift-control-plane-operators 2000 ByUser 94m False openshift-kube-apiserver-operator openshift-control-plane-operators 2000 ByUser 88m False openshift-monitoring-metrics workload-high 2000 ByUser 88m False service-accounts workload-low 9000 ByUser 106m False global-default global-default 9900 ByUser 106m False catch-all catch-all 10000 ByUser 106m False $ oc get prioritylevelconfiguration workload-low -o jsonpath='{.spec.limited.assuredConcurrencyShares}' 100 $ oc get prioritylevelconfiguration global-default -o jsonpath='{.spec.limited.assuredConcurrencyShares}' 20 The results are as expected, per https://bugzilla.redhat.com/show_bug.cgi?id=1891108#c11, will move the bug VERIFIED, and file new bug to track upgrade. Hi akashem, Could please move the status to ON_QA so that I move it VERIFIED, per the bug process, I am unable to move the bug from ASSIGNED to VERIFIED directly,
Filed new bug for upgrade: https://bugzilla.redhat.com/show_bug.cgi?id=1926724
moving it to ON_QA as advised
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633