Bug 1891107 - p&f: Increase the concurrency share of workload-low priority level
Summary: p&f: Increase the concurrency share of workload-low priority level
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.6.z
Assignee: Abu Kashem
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On: 1891108
Blocks: 1891106
TreeView+ depends on / blocked
 
Reported: 2020-10-23 19:28 UTC by Abu Kashem
Modified: 2024-03-25 16:48 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1891106
: 1891108 (view as bug list)
Environment:
Last Closed: 2021-02-22 13:54:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 428 0 None closed BUG 1891107: UPSTREAM: 95259: allocate service-account flowschema to global-default 2021-02-16 05:29:49 UTC
Red Hat Knowledge Base (Solution) 5448851 0 None None None 2020-12-22 07:22:12 UTC
Red Hat Product Errata RHBA-2021:0510 0 None None None 2021-02-22 13:54:49 UTC

Description Abu Kashem 2020-10-23 19:28:58 UTC
+++ This bug was initially created as a clone of Bug #1891106 +++

priority & fairness: Increase the concurrency share of workload-low priority level

carry upstream PR: https://github.com/kubernetes/kubernetes/pull/95259


All workloads running using service account (except for the ones distinguished by p&f with a logically higher matching precedence) will match the `service-accounts` flow schema and be assigned to the `workload-low` priority and thus will have only `20` concurrency shares. (~10% of the total)

On the other hand, `global-default` flow schema is assigned to `global-default` priority configuration and thus will have `100` concurrency shares (~50% of the total). If I am not mistaken, `global-default` goes pretty much unused since workloads running with user (not service account) will fall into this category and is not very common. 

Workload with service accounts do not have enough concurrency share and may starve. Increase the concurrency share of `workload-low` from `20` to `100` and reduce that of `global-default` from `100` to `20`. 


We have been asking customer to apply the patch manually: https://bugzilla.redhat.com/show_bug.cgi?id=1883589#c56
> oc patch prioritylevelconfiguration workload-low --type=merge -p '{"spec":{"limited":{"assuredConcurrencyShares": 100}}}'
> oc patch prioritylevelconfiguration global-default --type=merge -p '{"spec":{"limited":{"assuredConcurrencyShares": 20}}}'


This will get rid of the need for manual patch.

Comment 1 Abu Kashem 2020-10-25 15:35:30 UTC
4.6 PR is open: https://github.com/openshift/kubernetes/pull/423

Comment 2 Abu Kashem 2020-11-13 16:22:41 UTC
4.6 PR: https://github.com/openshift/kubernetes/pull/428

Hi kewang,
can you start verifying this on 4.6 and then 4.5? (the master/4.7 does not have the change yet)

Comment 3 Abu Kashem 2020-12-04 15:15:39 UTC
Hi kewang,
we can start testing on 4.6 first and then 4.5?

Comment 4 Abu Kashem 2020-12-04 15:23:57 UTC
Hi kewang, 
the 4.6 PR hasn't merged yet :(

Comment 6 Ke Wang 2020-12-10 11:15:26 UTC
Hi akashem, I did a quick test in pre-merge verifying way, here are some checkpoints, need you to confirm if the results are as expected. And upgrade from 4.5 to 4.6 test scenario, do I need to try?

$ oc get clusterversion
NAME      VERSION                                           AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.ci.test-2020-12-10-093009-ci-ln-sggv83t   True        False         45m     Cluster version is 4.6.0-0.ci.test-2020-12-10-093009-ci-ln-sggv83t

$ oc get FlowSchema
NAME                                PRIORITYLEVEL                       MATCHINGPRECEDENCE   DISTINGUISHERMETHOD   AGE   MISSINGPL
exempt                              exempt                              1                    <none>                75m   False
openshift-apiserver-sar             exempt                              2                    ByUser                65m   False
openshift-oauth-apiserver-sar       exempt                              2                    ByUser                65m   False
system-leader-election              leader-election                     100                  ByUser                75m   False
workload-leader-election            leader-election                     200                  ByUser                75m   False
system-nodes                        system                              500                  ByUser                75m   False
kube-controller-manager             workload-high                       800                  ByNamespace           75m   False
kube-scheduler                      workload-high                       800                  ByNamespace           75m   False
kube-system-service-accounts        workload-high                       900                  ByNamespace           75m   False
openshift-apiserver                 workload-high                       1000                 ByUser                65m   False
openshift-controller-manager        workload-high                       1000                 ByUser                65m   False
openshift-oauth-apiserver           workload-high                       1000                 ByUser                65m   False
openshift-oauth-server              workload-high                       1000                 ByUser                65m   False
openshift-apiserver-operator        openshift-control-plane-operators   2000                 ByUser                65m   False
openshift-authentication-operator   openshift-control-plane-operators   2000                 ByUser                65m   False
openshift-etcd-operator             openshift-control-plane-operators   2000                 ByUser                65m   False
openshift-kube-apiserver-operator   openshift-control-plane-operators   2000                 ByUser                65m   False
openshift-monitoring-metrics        workload-high                       2000                 ByUser                65m   False
service-accounts                    workload-low                        9000                 ByUser                75m   False
global-default                      global-default                      9900                 ByUser                75m   False
catch-all                           catch-all                           10000                ByUser                75m   False

$ oc get prioritylevelconfiguration workload-low -o jsonpath='{.spec.limited.assuredConcurrencyShares}'
100

$ oc get prioritylevelconfiguration global-default -o jsonpath='{.spec.limited.assuredConcurrencyShares}'
20

Comment 7 Ke Wang 2020-12-11 10:42:06 UTC
This bug's PR is dev-approved and not yet merged, so I'm following issue DPTP-660 to do the pre-merge verifying for QE pre-merge verification goal of issue OCPQE-815 by using the bot to launch a cluster with the open PR.  Here is the verification steps: see above https://bugzilla.redhat.com/show_bug.cgi?id=1891107#c6, for upgrade case, it's not easy to run by cluster-bot env, will try on nightly build, so the bug is pre-merge verified. After the PR gets merged, the bug will be moved to VERIFIED by the bot automatically or, if not working, by me manually.

Comment 8 Abu Kashem 2020-12-11 14:31:32 UTC
$ oc get FlowSchema
NAME                                PRIORITYLEVEL                       MATCHINGPRECEDENCE   DISTINGUISHERMETHOD   AGE   MISSINGPL
exempt                              exempt                              1                    <none>                75m   False
service-accounts                    workload-low                        9000                 ByUser                75m   False
global-default                      global-default                      9900                 ByUser                75m   False

$ oc get prioritylevelconfiguration workload-low -o jsonpath='{.spec.limited.assuredConcurrencyShares}'
100

$ oc get prioritylevelconfiguration global-default -o jsonpath='{.spec.limited.assuredConcurrencyShares}'
20


kewang,
This looks good! 

> And upgrade from 4.5 to 4.6 test scenario, do I need to try
You can try an upgrade, I don't expect any failure since these are built-in objects managed by apiserver, not by cvo. But it will be a good test.

Thanks for doing this!

Comment 12 Ke Wang 2021-02-18 07:10:11 UTC
Checked with latest OCP 4.6 nightly payload,

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-02-17-215814   True        False         40m     Cluster version is 4.6.0-0.nightly-2021-02-17-215814

$ oc get FlowSchema
NAME                                PRIORITYLEVEL                       MATCHINGPRECEDENCE   DISTINGUISHERMETHOD   AGE   MISSINGPL
exempt                              exempt                              1                    <none>                75m   False
openshift-apiserver-sar             exempt                              2                    ByUser                65m   False
openshift-oauth-apiserver-sar       exempt                              2                    ByUser                65m   False
system-leader-election              leader-election                     100                  ByUser                75m   False
workload-leader-election            leader-election                     200                  ByUser                75m   False
system-nodes                        system                              500                  ByUser                75m   False
kube-controller-manager             workload-high                       800                  ByNamespace           75m   False
kube-scheduler                      workload-high                       800                  ByNamespace           75m   False
kube-system-service-accounts        workload-high                       900                  ByNamespace           75m   False
openshift-apiserver                 workload-high                       1000                 ByUser                65m   False
openshift-controller-manager        workload-high                       1000                 ByUser                65m   False
openshift-oauth-apiserver           workload-high                       1000                 ByUser                65m   False
openshift-oauth-server              workload-high                       1000                 ByUser                65m   False
openshift-apiserver-operator        openshift-control-plane-operators   2000                 ByUser                65m   False
openshift-authentication-operator   openshift-control-plane-operators   2000                 ByUser                65m   False
openshift-etcd-operator             openshift-control-plane-operators   2000                 ByUser                65m   False
openshift-kube-apiserver-operator   openshift-control-plane-operators   2000                 ByUser                65m   False
openshift-monitoring-metrics        workload-high                       2000                 ByUser                65m   False
service-accounts                    workload-low                        9000                 ByUser                75m   False
global-default                      global-default                      9900                 ByUser                75m   False
catch-all                           catch-all                           10000                ByUser                75m   False

$ oc get prioritylevelconfiguration workload-low -o jsonpath='{.spec.limited.assuredConcurrencyShares}'
100

$ oc get prioritylevelconfiguration global-default -o jsonpath='{.spec.limited.assuredConcurrencyShares}'
20

Above results are as expected, so move the bug VERIFIED.

Comment 14 errata-xmlrpc 2021-02-22 13:54:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.18 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0510


Note You need to log in before you can comment on or make changes to this bug.