Bug 2030955

Summary: cluster-machine-approver reports incorrect mapi_current_pending_csr metric
Product: OpenShift Container Platform Reporter: dofinn
Component: Cloud ComputeAssignee: Joel Speed <jspeed>
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: aabhishe
Version: 4.8Keywords: ServiceDeliveryImpact
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-10 13:07:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description dofinn 2021-12-10 04:27:49 UTC
Description of problem:

SRE utilized `mapi_current_pending_csr` to help alert on specific conditions. We are encountering false positives as this metric is present and of a positive integer when no CSRs exist on the cluster. 


How reproducible:
We have encountered a few false positives across our fleet.


Steps to Reproduce:
1.TBA
2.
3.

Actual results:

```
[~ {production} (cluster:default)]$ oc get csr -A
No resources found
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver exec deploy/machine-approver -- curl http://127.0.0.1:9191/metrics | grep mapi
Defaulting container name to kube-rbac-proxy.
Use 'oc describe pod/machine-approver-757975f664-rhq5x -n openshift-cluster-machine-approver' to see all of the containers in this pod.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP mapi_current_pending_csr Count of pending CSRs at the cluster level
# TYPE mapi_current_pending_csr gauge
mapi_current_pending_csr 3
# HELP mapi_max_pending_csr Threshold value of the pending CSRs beyond which any new CSR requests will be ignored
# TYPE mapi_max_pending_csr gauge
mapi_max_pending_csr 107
100  195k    0  195k    0     0  63.4M      0 --:--:-- --:--:-- --:--:-- 63.4M

```

Expected results:
```
[~ {production} (cluster:default)]$ oc get csr -A
No resources found
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver exec deploy/machine-approver -- curl http://127.0.0.1:9191/metrics | grep mapi
Defaulting container name to kube-rbac-proxy.
Use 'oc describe pod/machine-approver-757975f664-rhq5x -n openshift-cluster-machine-approver' to see all of the containers in this pod.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP mapi_current_pending_csr Count of pending CSRs at the cluster level
# TYPE mapi_current_pending_csr gauge
mapi_current_pending_csr 0
# HELP mapi_max_pending_csr Threshold value of the pending CSRs beyond which any new CSR requests will be ignored
# TYPE mapi_max_pending_csr gauge
mapi_max_pending_csr 107
100  195k    0  195k    0     0  63.4M      0 --:--:-- --:--:-- --:--:-- 63.4M


Additional info:
It may be the controller caching. The metric becomes accurate after deleting the pod. 

```
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver exec deploy/machine-approver -- curl http://127.0.0.1:9191/metrics | grep mapi
Defaulting container name to kube-rbac-proxy.
Use 'oc describe pod/machine-approver-757975f664-rhq5x -n openshift-cluster-machine-approver' to see all of the containers in this pod.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP mapi_current_pending_csr Count of pending CSRs at the cluster level
# TYPE mapi_current_pending_csr gauge
mapi_current_pending_csr 3
# HELP mapi_max_pending_csr Threshold value of the pending CSRs beyond which any new CSR requests will be ignored
# TYPE mapi_max_pending_csr gauge
mapi_max_pending_csr 107
100  195k    0  195k    0     0  63.4M      0 --:--:-- --:--:-- --:--:-- 63.4M
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver get pods
NAME                                READY   STATUS    RESTARTS   AGE
machine-approver-757975f664-rhq5x   2/2     Running   5          44h
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver delete pod machine-approver-757975f664-rhq5x
pod "machine-approver-757975f664-rhq5x" deleted
^C
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver get pods
NAME                                READY   STATUS              RESTARTS   AGE
machine-approver-757975f664-94gjz   0/2     ContainerCreating   0          8s
machine-approver-757975f664-rhq5x   0/2     Terminating         5          45h
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver get pods
NAME                                READY   STATUS    RESTARTS   AGE
machine-approver-757975f664-94gjz   2/2     Running   0          18s
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver exec deploy/machine-approver -- curl http://127.0.0.1:9191/metrics | grep mapi
Defaulting container name to kube-rbac-proxy.
Use 'oc describe pod/machine-approver-757975f664-94gjz -n openshift-cluster-machine-approver' to see all of the containers in this pod.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP mapi_current_pending_csr Count of pending CSRs at the cluster level
# TYPE mapi_current_pending_csr gauge
mapi_current_pending_csr 0
# HELP mapi_max_pending_csr Threshold value of the pending CSRs beyond which any new CSR requests will be ignored
# TYPE mapi_max_pending_csr gauge
mapi_max_pending_csr 0
100  188k    0  188k    0     0  30.7M      0 --:--:-- --:--:-- --:--:-- 30.7M
```

Comment 1 Joel Speed 2021-12-10 13:07:23 UTC
We have already fixed this in 4.9 and 4.10, we have started the backport process for 4.8 but that is being tracked in another bug, will mark this as duplicate

*** This bug has been marked as a duplicate of bug 2019754 ***