Bug 1906916

Summary: Teach CVO about flowcontrol.apiserver.k8s.io/v1beta1
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Cluster Version OperatorAssignee: Jack Ottofaro <jack.ottofaro>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.7CC: aos-bugs, jokerman
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Pivot to Kubernetes 1.20 Consequence: Unable to apply manifests requiring flowcontrol. Fix: Pickup flowcontrol.apiserver.k8s.io/v1beta1 to support api-server Kubernetes bump to 1.20.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:43:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description W. Trevor King 2020-12-11 18:45:06 UTC
CVO already understands v1alpha1:

$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.6.8-x86_64
Extracted release payload from digest sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf created at 2020-12-09T11:35:37Z
$ grep -ir flowcontrol manifests 
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1

With the pivot to Kubernetes 1.20 [1], components need to push (or handle, or something) v1beta1 forms.  Currently the CVO chokes on that with:

2020-12-11T11:55:54.565747078Z E1211 11:55:54.565682       1 task.go:81] error running apply for flowschema "openshift-etcd-operator" (72 of 670): no kind "FlowSchema" is registered for version "flowcontrol.apiserver.k8s.io/v1beta1" in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30"

This bug is about avoiding that error, by bumping our vendored client-go and registering the v1beta1 handler.

We have no flowcontrol-specific handling today, so no promises that this will actually help with things like:

* CVO appropriately merges divering in-cluster flowcontrol specs with manifest specs.
* CVO notices when the controller fails to reconcile and sets the Dangling=False .status.condition.
* CVO notices when the controller is progressing, and blocks update-graph reconciliation until the in-cluster object is level.

But API-server folks are currently blocked on rebasing by this, and want to see if a naive bump to accept v1beta1 is sufficient to unblock them.

[1]: https://github.com/openshift/kubernetes/pull/471

Comment 3 Johnny Liu 2020-12-22 04:24:55 UTC
Verified this bug with 4.7.0-0.nightly-2020-12-20-055006, passed.

[root@preserve-jialiu-ansible ~]# oc get node
NAME                                                            STATUS   ROLES    AGE   VERSION
qe-metering-1221-9gn5p-master-0.c.openshift-qe.internal         Ready    master   15h   v1.20.0+87544c5
qe-metering-1221-9gn5p-master-1.c.openshift-qe.internal         Ready    master   15h   v1.20.0+87544c5
qe-metering-1221-9gn5p-master-2.c.openshift-qe.internal         Ready    master   15h   v1.20.0+87544c5
qe-metering-1221-9gn5p-worker-a-nq4pr.c.openshift-qe.internal   Ready    worker   14h   v1.20.0+87544c5
qe-metering-1221-9gn5p-worker-b-6sbtn.c.openshift-qe.internal   Ready    worker   14h   v1.20.0+87544c5
qe-metering-1221-9gn5p-worker-c-rfmzp.c.openshift-qe.internal   Ready    worker   14h   v1.20.0+87544c5

kubenate is getting to 1.20 version.


[root@preserve-jialiu-ansible demo5]# oc adm release extract --to manifests registry.svc.ci.openshift.org/ocp/release@sha256:ea2d954b1ac4b2818c419055afdb9ff87c5b95fa03c3258a26dc542f4ecab5d8
Extracted release payload created at 2020-12-20T06:04:57Z

[root@preserve-jialiu-ansible demo5]# grep -ir flowcontrol manifests 
manifests/0000_50_cluster-authentication-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_50_cluster-authentication-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_50_cluster-authentication-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_50_cluster-authentication-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_12_etcd-operator_10_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_20_kube-apiserver-operator_08_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_90_kube-apiserver-operator_05_api_performance_dashboard.yaml:              "expr": "histogram_quantile(0.99, sum(rate(apiserver_flowcontrol_request_wait_duration_seconds_bucket{apiserver=\"$apiserver\",execute=\"true\"}[$period])) by(flowSchema, priorityLevel, le))",
manifests/0000_90_kube-apiserver-operator_05_api_performance_dashboard.yaml:              "expr": "sum(rate(apiserver_flowcontrol_rejected_requests_total{apiserver=\"$apiserver\"}[$period])) by (flowSchema,priorityLevel,reason)",
manifests/0000_90_kube-apiserver-operator_05_api_performance_dashboard.yaml:              "expr": "sum(rate(apiserver_flowcontrol_dispatched_requests_total{apiserver=\"$apiserver\"}[$period])) by(flowSchema,priorityLevel)",
manifests/0000_90_kube-apiserver-operator_05_api_performance_dashboard.yaml:              "expr": "histogram_quantile(0.99, sum(rate(apiserver_flowcontrol_request_queue_length_after_enqueue_bucket{apiserver=\"$apiserver\"}[$period])) by(flowSchema, priorityLevel, le))",
manifests/0000_90_kube-apiserver-operator_05_api_performance_dashboard.yaml:              "expr": "sum(apiserver_flowcontrol_current_executing_requests{apiserver=\"$apiserver\"}) by (priorityLevel,flowSchema)",
manifests/0000_90_kube-apiserver-operator_05_api_performance_dashboard.yaml:              "expr": "histogram_quantile(0.99, sum(rate(apiserver_flowcontrol_request_execution_seconds_bucket{apiserver=\"$apiserver\"}[$period])) by(flowSchema, priorityLevel, le) ) ",
manifests/0000_90_kube-apiserver-operator_05_api_performance_dashboard.yaml:              "expr": "sum(apiserver_flowcontrol_current_inqueue_requests{apiserver=\"$apiserver\"}) by (flowSchema,priorityLevel)",
manifests/0000_90_kube-apiserver-operator_05_api_performance_dashboard.yaml:              "expr": "sum(apiserver_flowcontrol_request_concurrency_limit{apiserver=\"$apiserver\"}) by (priorityLevel)",
manifests/0000_70_cluster-network-operator_04_kubeapiserver_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_30_openshift-apiserver-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_30_openshift-apiserver-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_30_openshift-apiserver-operator_09_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
manifests/0000_50_cluster-openshift-controller-manager-operator_10_flowschema.yaml:apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1


[root@preserve-jialiu-ansible ~]#  oc get FlowSchema openshift-etcd-operator
NAME                      PRIORITYLEVEL                       MATCHINGPRECEDENCE   DISTINGUISHERMETHOD   AGE   MISSINGPL
openshift-etcd-operator   openshift-control-plane-operators   2000                 ByUser                15h   False


[root@preserve-jialiu-ansible demo5]# oc get FlowSchema openshift-etcd-operator -o yaml|grep apiVersion
apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
  - apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
  - apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1

[root@preserve-jialiu-ansible ~]# oc get FlowSchema openshift-kube-apiserver-operator
NAME                                PRIORITYLEVEL                       MATCHINGPRECEDENCE   DISTINGUISHERMETHOD   AGE   MISSINGPL
openshift-kube-apiserver-operator   openshift-control-plane-operators   2000                 ByUser                14h   False

[root@preserve-jialiu-ansible demo5]# oc get FlowSchema openshift-kube-apiserver-operator -o yaml|grep apiVersion
apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
  - apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
  - apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1

[root@preserve-jialiu-ansible ~]# oc  -n openshift-cluster-version logs cluster-version-operator-dcbc59f47-dsk48|grep flowschema|grep openshift-etcd-operator
I1222 00:30:48.582871       1 sync_worker.go:729] Running sync for flowschema "openshift-etcd-operator" (73 of 663)
I1222 00:30:48.675622       1 request.go:591] Throttling request took 92.457046ms, request: GET:https://api-int.qe-metering-1221.qe.gcp.devcluster.openshift.com:6443/apis/flowcontrol.apiserver.k8s.io/v1alpha1/flowschemas/openshift-etcd-operator
I1222 00:30:48.775636       1 request.go:591] Throttling request took 95.755092ms, request: PUT:https://api-int.qe-metering-1221.qe.gcp.devcluster.openshift.com:6443/apis/flowcontrol.apiserver.k8s.io/v1alpha1/flowschemas/openshift-etcd-operator
I1222 00:30:48.784113       1 sync_worker.go:741] Done syncing for flowschema "openshift-etcd-operator" (73 of 663)
I1222 00:34:20.505797       1 sync_worker.go:729] Running sync for flowschema "openshift-etcd-operator" (73 of 663)
I1222 00:34:20.599696       1 request.go:591] Throttling request took 93.711034ms, request: GET:https://api-int.qe-metering-1221.qe.gcp.devcluster.openshift.com:6443/apis/flowcontrol.apiserver.k8s.io/v1alpha1/flowschemas/openshift-etcd-operator
I1222 00:34:20.699669       1 request.go:591] Throttling request took 92.355015ms, request: PUT:https://api-int.qe-metering-1221.qe.gcp.devcluster.openshift.com:6443/apis/flowcontrol.apiserver.k8s.io/v1alpha1/flowschemas/openshift-etcd-operator
I1222 00:34:20.706856       1 sync_worker.go:741] Done syncing for flowschema "openshift-etcd-operator" (73 of 663)

[root@preserve-jialiu-ansible ~]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-12-20-055006   True        False         14h     Cluster version is 4.7.0-0.nightly-2020-12-20-055006

If the above verification is not enough, pls let me know.

Comment 5 errata-xmlrpc 2021-02-24 15:43:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633