Bug 2030955 - cluster-machine-approver reports incorrect mapi_current_pending_csr metric
Summary: cluster-machine-approver reports incorrect mapi_current_pending_csr metric
Keywords:
Status: CLOSED DUPLICATE of bug 2019754
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Joel Speed
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-10 04:27 UTC by dofinn
Modified: 2021-12-10 13:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-10 13:07:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description dofinn 2021-12-10 04:27:49 UTC
Description of problem:

SRE utilized `mapi_current_pending_csr` to help alert on specific conditions. We are encountering false positives as this metric is present and of a positive integer when no CSRs exist on the cluster. 


How reproducible:
We have encountered a few false positives across our fleet.


Steps to Reproduce:
1.TBA
2.
3.

Actual results:

```
[~ {production} (cluster:default)]$ oc get csr -A
No resources found
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver exec deploy/machine-approver -- curl http://127.0.0.1:9191/metrics | grep mapi
Defaulting container name to kube-rbac-proxy.
Use 'oc describe pod/machine-approver-757975f664-rhq5x -n openshift-cluster-machine-approver' to see all of the containers in this pod.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP mapi_current_pending_csr Count of pending CSRs at the cluster level
# TYPE mapi_current_pending_csr gauge
mapi_current_pending_csr 3
# HELP mapi_max_pending_csr Threshold value of the pending CSRs beyond which any new CSR requests will be ignored
# TYPE mapi_max_pending_csr gauge
mapi_max_pending_csr 107
100  195k    0  195k    0     0  63.4M      0 --:--:-- --:--:-- --:--:-- 63.4M

```

Expected results:
```
[~ {production} (cluster:default)]$ oc get csr -A
No resources found
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver exec deploy/machine-approver -- curl http://127.0.0.1:9191/metrics | grep mapi
Defaulting container name to kube-rbac-proxy.
Use 'oc describe pod/machine-approver-757975f664-rhq5x -n openshift-cluster-machine-approver' to see all of the containers in this pod.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP mapi_current_pending_csr Count of pending CSRs at the cluster level
# TYPE mapi_current_pending_csr gauge
mapi_current_pending_csr 0
# HELP mapi_max_pending_csr Threshold value of the pending CSRs beyond which any new CSR requests will be ignored
# TYPE mapi_max_pending_csr gauge
mapi_max_pending_csr 107
100  195k    0  195k    0     0  63.4M      0 --:--:-- --:--:-- --:--:-- 63.4M


Additional info:
It may be the controller caching. The metric becomes accurate after deleting the pod. 

```
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver exec deploy/machine-approver -- curl http://127.0.0.1:9191/metrics | grep mapi
Defaulting container name to kube-rbac-proxy.
Use 'oc describe pod/machine-approver-757975f664-rhq5x -n openshift-cluster-machine-approver' to see all of the containers in this pod.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP mapi_current_pending_csr Count of pending CSRs at the cluster level
# TYPE mapi_current_pending_csr gauge
mapi_current_pending_csr 3
# HELP mapi_max_pending_csr Threshold value of the pending CSRs beyond which any new CSR requests will be ignored
# TYPE mapi_max_pending_csr gauge
mapi_max_pending_csr 107
100  195k    0  195k    0     0  63.4M      0 --:--:-- --:--:-- --:--:-- 63.4M
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver get pods
NAME                                READY   STATUS    RESTARTS   AGE
machine-approver-757975f664-rhq5x   2/2     Running   5          44h
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver delete pod machine-approver-757975f664-rhq5x
pod "machine-approver-757975f664-rhq5x" deleted
^C
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver get pods
NAME                                READY   STATUS              RESTARTS   AGE
machine-approver-757975f664-94gjz   0/2     ContainerCreating   0          8s
machine-approver-757975f664-rhq5x   0/2     Terminating         5          45h
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver get pods
NAME                                READY   STATUS    RESTARTS   AGE
machine-approver-757975f664-94gjz   2/2     Running   0          18s
[~ {production} (cluster:default)]$ oc -n openshift-cluster-machine-approver exec deploy/machine-approver -- curl http://127.0.0.1:9191/metrics | grep mapi
Defaulting container name to kube-rbac-proxy.
Use 'oc describe pod/machine-approver-757975f664-94gjz -n openshift-cluster-machine-approver' to see all of the containers in this pod.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP mapi_current_pending_csr Count of pending CSRs at the cluster level
# TYPE mapi_current_pending_csr gauge
mapi_current_pending_csr 0
# HELP mapi_max_pending_csr Threshold value of the pending CSRs beyond which any new CSR requests will be ignored
# TYPE mapi_max_pending_csr gauge
mapi_max_pending_csr 0
100  188k    0  188k    0     0  30.7M      0 --:--:-- --:--:-- --:--:-- 30.7M
```

Comment 1 Joel Speed 2021-12-10 13:07:23 UTC
We have already fixed this in 4.9 and 4.10, we have started the backport process for 4.8 but that is being tracked in another bug, will mark this as duplicate

*** This bug has been marked as a duplicate of bug 2019754 ***


Note You need to log in before you can comment on or make changes to this bug.