+++ This bug was initially created as a clone of Bug #1891106 +++ priority & fairness: Increase the concurrency share of workload-low priority level carry upstream PR: https://github.com/kubernetes/kubernetes/pull/95259 All workloads running using service account (except for the ones distinguished by p&f with a logically higher matching precedence) will match the `service-accounts` flow schema and be assigned to the `workload-low` priority and thus will have only `20` concurrency shares. (~10% of the total) On the other hand, `global-default` flow schema is assigned to `global-default` priority configuration and thus will have `100` concurrency shares (~50% of the total). If I am not mistaken, `global-default` goes pretty much unused since workloads running with user (not service account) will fall into this category and is not very common. Workload with service accounts do not have enough concurrency share and may starve. Increase the concurrency share of `workload-low` from `20` to `100` and reduce that of `global-default` from `100` to `20`. We have been asking customer to apply the patch manually: https://bugzilla.redhat.com/show_bug.cgi?id=1883589#c56 > oc patch prioritylevelconfiguration workload-low --type=merge -p '{"spec":{"limited":{"assuredConcurrencyShares": 100}}}' > oc patch prioritylevelconfiguration global-default --type=merge -p '{"spec":{"limited":{"assuredConcurrencyShares": 20}}}' This will get rid of the need for manual patch.
4.6 PR is open: https://github.com/openshift/kubernetes/pull/423
4.6 PR: https://github.com/openshift/kubernetes/pull/428 Hi kewang, can you start verifying this on 4.6 and then 4.5? (the master/4.7 does not have the change yet)
Hi kewang, we can start testing on 4.6 first and then 4.5?
Hi kewang, the 4.6 PR hasn't merged yet :(
Hi akashem, I did a quick test in pre-merge verifying way, here are some checkpoints, need you to confirm if the results are as expected. And upgrade from 4.5 to 4.6 test scenario, do I need to try? $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.ci.test-2020-12-10-093009-ci-ln-sggv83t True False 45m Cluster version is 4.6.0-0.ci.test-2020-12-10-093009-ci-ln-sggv83t $ oc get FlowSchema NAME PRIORITYLEVEL MATCHINGPRECEDENCE DISTINGUISHERMETHOD AGE MISSINGPL exempt exempt 1 <none> 75m False openshift-apiserver-sar exempt 2 ByUser 65m False openshift-oauth-apiserver-sar exempt 2 ByUser 65m False system-leader-election leader-election 100 ByUser 75m False workload-leader-election leader-election 200 ByUser 75m False system-nodes system 500 ByUser 75m False kube-controller-manager workload-high 800 ByNamespace 75m False kube-scheduler workload-high 800 ByNamespace 75m False kube-system-service-accounts workload-high 900 ByNamespace 75m False openshift-apiserver workload-high 1000 ByUser 65m False openshift-controller-manager workload-high 1000 ByUser 65m False openshift-oauth-apiserver workload-high 1000 ByUser 65m False openshift-oauth-server workload-high 1000 ByUser 65m False openshift-apiserver-operator openshift-control-plane-operators 2000 ByUser 65m False openshift-authentication-operator openshift-control-plane-operators 2000 ByUser 65m False openshift-etcd-operator openshift-control-plane-operators 2000 ByUser 65m False openshift-kube-apiserver-operator openshift-control-plane-operators 2000 ByUser 65m False openshift-monitoring-metrics workload-high 2000 ByUser 65m False service-accounts workload-low 9000 ByUser 75m False global-default global-default 9900 ByUser 75m False catch-all catch-all 10000 ByUser 75m False $ oc get prioritylevelconfiguration workload-low -o jsonpath='{.spec.limited.assuredConcurrencyShares}' 100 $ oc get prioritylevelconfiguration global-default -o jsonpath='{.spec.limited.assuredConcurrencyShares}' 20
This bug's PR is dev-approved and not yet merged, so I'm following issue DPTP-660 to do the pre-merge verifying for QE pre-merge verification goal of issue OCPQE-815 by using the bot to launch a cluster with the open PR. Here is the verification steps: see above https://bugzilla.redhat.com/show_bug.cgi?id=1891107#c6, for upgrade case, it's not easy to run by cluster-bot env, will try on nightly build, so the bug is pre-merge verified. After the PR gets merged, the bug will be moved to VERIFIED by the bot automatically or, if not working, by me manually.
$ oc get FlowSchema NAME PRIORITYLEVEL MATCHINGPRECEDENCE DISTINGUISHERMETHOD AGE MISSINGPL exempt exempt 1 <none> 75m False service-accounts workload-low 9000 ByUser 75m False global-default global-default 9900 ByUser 75m False $ oc get prioritylevelconfiguration workload-low -o jsonpath='{.spec.limited.assuredConcurrencyShares}' 100 $ oc get prioritylevelconfiguration global-default -o jsonpath='{.spec.limited.assuredConcurrencyShares}' 20 kewang, This looks good! > And upgrade from 4.5 to 4.6 test scenario, do I need to try You can try an upgrade, I don't expect any failure since these are built-in objects managed by apiserver, not by cvo. But it will be a good test. Thanks for doing this!
Checked with latest OCP 4.6 nightly payload, $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-02-17-215814 True False 40m Cluster version is 4.6.0-0.nightly-2021-02-17-215814 $ oc get FlowSchema NAME PRIORITYLEVEL MATCHINGPRECEDENCE DISTINGUISHERMETHOD AGE MISSINGPL exempt exempt 1 <none> 75m False openshift-apiserver-sar exempt 2 ByUser 65m False openshift-oauth-apiserver-sar exempt 2 ByUser 65m False system-leader-election leader-election 100 ByUser 75m False workload-leader-election leader-election 200 ByUser 75m False system-nodes system 500 ByUser 75m False kube-controller-manager workload-high 800 ByNamespace 75m False kube-scheduler workload-high 800 ByNamespace 75m False kube-system-service-accounts workload-high 900 ByNamespace 75m False openshift-apiserver workload-high 1000 ByUser 65m False openshift-controller-manager workload-high 1000 ByUser 65m False openshift-oauth-apiserver workload-high 1000 ByUser 65m False openshift-oauth-server workload-high 1000 ByUser 65m False openshift-apiserver-operator openshift-control-plane-operators 2000 ByUser 65m False openshift-authentication-operator openshift-control-plane-operators 2000 ByUser 65m False openshift-etcd-operator openshift-control-plane-operators 2000 ByUser 65m False openshift-kube-apiserver-operator openshift-control-plane-operators 2000 ByUser 65m False openshift-monitoring-metrics workload-high 2000 ByUser 65m False service-accounts workload-low 9000 ByUser 75m False global-default global-default 9900 ByUser 75m False catch-all catch-all 10000 ByUser 75m False $ oc get prioritylevelconfiguration workload-low -o jsonpath='{.spec.limited.assuredConcurrencyShares}' 100 $ oc get prioritylevelconfiguration global-default -o jsonpath='{.spec.limited.assuredConcurrencyShares}' 20 Above results are as expected, so move the bug VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.18 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0510