Bug 1966410

Summary: kube-controller-manager should not trigger APIRemovedInNextReleaseInUse alert
Product: OpenShift Container Platform Reporter: Stefan Schimanski <sttts>
Component: kube-controller-managerAssignee: Maciej Szulik <maszulik>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: high Docs Contact:
Priority: high    
Version: 4.8CC: alegrand, anpicker, aos-bugs, erooth, hongyli, juzhao, kakkoyun, kewang, lcosic, mbukatov, mfojtik, pkrupa, surbania, xxia, yinzhou
Target Milestone: ---Keywords: Reopened
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1947719 Environment:
Last Closed: 2021-07-27 23:10:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1947719    

Description Stefan Schimanski 2021-06-01 06:05:24 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1952049#c9 identifies kcm as offender triggering the APIRemovedInNextReleaseInUse alert. This is wrong. kcm with its dynamic informers for quota, gc and namespace deletion has to be excluded from the alert.

+++ This bug was initially created as a clone of Bug #1947719 +++

Created attachment 1770482 [details]
alert screen shot

Created attachment 1770482 [details]
alert screen shot

Description of problem:
8 DeprecatedAPIInUse info alerts display

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-04-08-200632

How reproducible:
always

Steps to Reproduce:
1. open console-monitoring-alerts
2.
3.

Actual results:
8 DeprecatedAPIInUse info alerts display

Expected results:
No other alerts display except watchdog

Additional info:

alert rule metrics:
group by(group, version, resource) (apiserver_requested_deprecated_apis{removed_release="1.22"}) and (sum by(group, version, resource) (rate(apiserver_request_total[10m]))) > 0

Element	Value:
{group="rbac.authorization.k8s.io",resource="roles",version="v1beta1"}	1
{group="admissionregistration.k8s.io",resource="mutatingwebhookconfigurations",version="v1beta1"}	1
{group="admissionregistration.k8s.io",resource="validatingwebhookconfigurations",version="v1beta1"}	1
{group="apiextensions.k8s.io",resource="customresourcedefinitions",version="v1beta1"}	1
{group="certificates.k8s.io",resource="certificatesigningrequests",version="v1beta1"}	1
{group="extensions",resource="ingresses",version="v1beta1"}	1
{group="rbac.authorization.k8s.io",resource="clusterrolebindings",version="v1beta1"}	1
{group="rbac.authorization.k8s.io",resource="rolebindings",version="v1beta1"}	1

----------------
# for i in roles mutatingwebhookconfigurations validatingwebhookconfigurations customresourcedefinitions certificatesigningrequests ingresses clusterrolebindings rolebindings; do oc api-resources | grep $i; echo -e "\n"; done
clusterroles                                           authorization.openshift.io/v1                 false        ClusterRole
roles                                                  authorization.openshift.io/v1                 true         Role
clusterroles                                           rbac.authorization.k8s.io/v1                  false        ClusterRole
roles                                                  rbac.authorization.k8s.io/v1                  true         Role
mutatingwebhookconfigurations                          admissionregistration.k8s.io/v1               false        MutatingWebhookConfiguration
validatingwebhookconfigurations                        admissionregistration.k8s.io/v1               false        ValidatingWebhookConfiguration
customresourcedefinitions             crd,crds         apiextensions.k8s.io/v1                       false        CustomResourceDefinition
certificatesigningrequests            csr              certificates.k8s.io/v1                        false        CertificateSigningRequest
ingresses                                              config.openshift.io/v1                        false        Ingress
ingresses                             ing              extensions/v1beta1                            true         Ingress
ingresses                             ing              networking.k8s.io/v1                          true         Ingress
clusterrolebindings                                    authorization.openshift.io/v1                 false        ClusterRoleBinding
clusterrolebindings                                    rbac.authorization.k8s.io/v1                  false        ClusterRoleBinding
clusterrolebindings                                    authorization.openshift.io/v1                 false        ClusterRoleBinding
rolebindings                                           authorization.openshift.io/v1                 true         RoleBinding
clusterrolebindings                                    rbac.authorization.k8s.io/v1                  false        ClusterRoleBinding
rolebindings                                           rbac.authorization.k8s.io/v1                  true         RoleBinding

--- Additional comment from Junqi Zhao on 2021-04-09 05:28:56 CEST ---

alert details
alert:DeprecatedAPIInUse
expr:group by(group, version, resource) (apiserver_requested_deprecated_apis{removed_release="1.22"}) and (sum by(group, version, resource) (rate(apiserver_request_total[10m]))) > 0
for: 1h
labels:
  severity: info
annotations:
  message: Deprecated API that will be removed in the next version is being used. Removing the workload that is using the {{"{{$labels.group}}"}}.{{"{{$labels.version}}"}}/{{"{{$labels.resource}}"}} API might be necessary for a successful upgrade to the next cluster version. Refer to the audit logs to identify the workload.

--- Additional comment from hongyan li on 2021-04-09 05:37:17 CEST ---



--- Additional comment from hongyan li on 2021-04-09 05:44:46 CEST ---

Different issue from bug 1932165 which is about variable not translated to value

--- Additional comment from Junqi Zhao on 2021-04-09 06:04:30 CEST ---

# oc version
Client Version: 4.8.0-0.nightly-2021-04-08-200632
Server Version: 4.8.0-0.nightly-2021-04-08-200632
Kubernetes Version: v1.21.0-rc.0+6d27558

checked from prometheus, query parameter:
count(apiserver_requested_deprecated_apis{removed_release="1.22"}) by(instance,version,group,resource)
version is v1beta1
{group="certificates.k8s.io", instance="10.0.160.188:6443", resource="certificatesigningrequests", version="v1beta1"} 1
{group="extensions", instance="10.0.160.188:6443", resource="ingresses", version="v1beta1"} 1
{group="rbac.authorization.k8s.io", instance="10.0.160.188:6443", resource="clusterrolebindings", version="v1beta1"} 1
{group="rbac.authorization.k8s.io", instance="10.0.160.188:6443", resource="rolebindings", version="v1beta1"} 1
{group="rbac.authorization.k8s.io", instance="10.0.160.188:6443", resource="roles", version="v1beta1"} 1
{group="admissionregistration.k8s.io", instance="10.0.160.188:6443", resource="mutatingwebhookconfigurations", version="v1beta1"} 1
{group="admissionregistration.k8s.io", instance="10.0.160.188:6443", resource="validatingwebhookconfigurations", version="v1beta1"} 1
{group="apiextensions.k8s.io", instance="10.0.160.188:6443", resource="customresourcedefinitions", version="v1beta1"} 1

but the api versions are all actually v1, which means apiserver_requested_deprecated_apis may post the wrong result
# for i in certificatesigningrequests ingresses clusterrolebindings rolebindings roles mutatingwebhookconfigurations validatingwebhookconfigurations customresourcedefinitions; do oc api-resources | grep $i; echo -e "\n"; done
certificatesigningrequests            csr              certificates.k8s.io/v1                        false        CertificateSigningRequest


ingresses                                              config.openshift.io/v1                        false        Ingress
ingresses                             ing              extensions/v1beta1                            true         Ingress
ingresses                             ing              networking.k8s.io/v1                          true         Ingress


clusterrolebindings                                    authorization.openshift.io/v1                 false        ClusterRoleBinding
clusterrolebindings                                    rbac.authorization.k8s.io/v1                  false        ClusterRoleBinding


clusterrolebindings                                    authorization.openshift.io/v1                 false        ClusterRoleBinding
rolebindings                                           authorization.openshift.io/v1                 true         RoleBinding
clusterrolebindings                                    rbac.authorization.k8s.io/v1                  false        ClusterRoleBinding
rolebindings                                           rbac.authorization.k8s.io/v1                  true         RoleBinding


clusterroles                                           authorization.openshift.io/v1                 false        ClusterRole
roles                                                  authorization.openshift.io/v1                 true         Role
clusterroles                                           rbac.authorization.k8s.io/v1                  false        ClusterRole
roles                                                  rbac.authorization.k8s.io/v1                  true         Role


mutatingwebhookconfigurations                          admissionregistration.k8s.io/v1               false        MutatingWebhookConfiguration


validatingwebhookconfigurations                        admissionregistration.k8s.io/v1               false        ValidatingWebhookConfiguration


customresourcedefinitions             crd,crds         apiextensions.k8s.io/v1                       false        CustomResourceDefinition

--- Additional comment from Stefan Schimanski on 2021-05-18 16:06:30 CEST ---



--- Additional comment from Martin Bukatovic on 2021-05-19 17:18:07 CEST ---

The alert this bug talks about is APIRemovedInNextEUSReleaseInUse. I'm mentioning it here so that it's possible to find this bug when one searches by content of bugzilla comments.

Comment 1 Xingxing Xia 2021-06-02 10:56:30 UTC
In QE CI env that runs many cases (created many projects for the cases), checked not only KCM, but also below kube-system:namespace-controller:
$ MASTERS=`oc get no | grep master | grep -o '^[^ ]*'`
$ for i in $MASTERS; do oc debug no/$i -- chroot /host bash -c "grep -hE '"'"k8s.io/removed-release":"[^"]+"'"' /var/log/kube-apiserver/audit*.log" ; done > all.log
$ grep '"k8s.io/removed-release":"1.22"' all.log > 1.22.log
$ jq -r '.user.username+": "+.requestURI' 1.22.log | sed 's/=[0-9][^&]*/=***/g' | sort | uniq -c | sort -n > 1.22.removed.apis
$ cat 1.22.removed.apis | grep -v kube-controller-manager
      2 system:serviceaccount:kube-system:namespace-controller: /apis/extensions/v1beta1/namespaces/01ptm/ingresses
      2 system:serviceaccount:kube-system:namespace-controller: /apis/extensions/v1beta1/namespaces/05mpl/ingresses
      2 system:serviceaccount:kube-system:namespace-controller: /apis/extensions/v1beta1/namespaces/0hqei/ingresses
      2 system:serviceaccount:kube-system:namespace-controller: /apis/extensions/v1beta1/namespaces/0t7dk/ingresses
...snipped, totally 352 lines...

Comment 3 Xingxing Xia 2021-06-15 04:11:57 UTC
Checked the PR code and verified in 4.8.0-0.nightly-2021-06-14-145150:
Positive testing:
Checking env, comment 1 still exists, and kube-controller-manager still accesses ingresses.v1beta1.extensions. Given this, the Alerting page does not show APIRemovedInNextReleaseInUse alert in Firing or Pending state. This means the alert already excludes KCM.

Negative testing:
OAS_SA_TOKEN=`oc sa get-token openshift-apiserver-sa -n openshift-apiserver`
oc login --token "$OAS_SA_TOKEN"
for i in {1..100}; do oc get ingresses.v1beta1.extensions; done
And checking metrics: sum by(system_client) (rate(apiserver_request_total{resource="ingresses",version="v1beta1"}[4h])) , there is one item with empty system_client. Given this, the Alerting page now shows APIRemovedInNextReleaseInUse alert in Firing or Pending state. This means the alert still works.

Comment 6 errata-xmlrpc 2021-07-27 23:10:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438