Bug 1952576 - csv_succeeded metric not present in olm-operator for all successful CSVs
Summary: csv_succeeded metric not present in olm-operator for all successful CSVs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.7
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: tflannag
QA Contact: xzha
URL:
Whiteboard:
: 1964716 (view as bug list)
Depends On:
Blocks: 2072995
TreeView+ depends on / blocked
 
Reported: 2021-04-22 15:06 UTC by Arjun Naik
Modified: 2022-04-12 19:17 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The value of the "csv_succeeded" metric was lost between pod restarts for the OLM Operator container as that metric was only emitted when a CSV's status sub-resource was changed. Consequence: The "csv_succeeded" metric is not always present for successfully installed CSVs. Fix: Emit the "csv_succeeded" metric at the beginning of the OLM Operator's startup logic. Result: The value of the "csv_succeeded" metric is correctly persisted during pod restarts.
Clone Of:
Environment:
Last Closed: 2022-03-10 16:03:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift operator-framework-olm pull 239 0 None open Bug 1952576: Emit CSV metric on startup 2022-01-05 19:32:54 UTC
Github operator-framework operator-lifecycle-manager pull 2216 0 None open Bug 1952576: Emit CSV metric on startup 2021-06-28 12:59:52 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:03:35 UTC

Description Arjun Naik 2021-04-22 15:06:51 UTC
Description of problem: The "csv_succeeded" metric is not always present for successfully installed CSVs


Version-Release number of selected component (if applicable):


How reproducible: Not easily reproducible because it's not present in all our clusters. 

Example: In one of our clusters, all CSVs excluding the "Copied" ones

❯ oc get csv -A | grep -v route-monitor | grep -v configure-alert                             
NAMESPACE                                          NAME                                               DISPLAY                           VERSION           REPLACES                                           PHASE
default                                            nginx-ingress-operator.v0.1.0                      Nginx Ingress Operator            0.1.0             nginx-ingress-operator.v0.0.7                      Succeeded
openshift-cloud-ingress-operator                   cloud-ingress-operator.v0.1.364-9a6ba79            cloud-ingress-operator            0.1.364-9a6ba79   cloud-ingress-operator.v0.1.362-2033e68            Succeeded
openshift-custom-domains-operator                  custom-domains-operator.v0.1.74-d2fee83            custom-domains-operator           0.1.74-d2fee83    custom-domains-operator.v0.1.59-5a6af11            Succeeded
openshift-managed-upgrade-operator                 managed-upgrade-operator.v0.1.586-9eec1d6          managed-upgrade-operator          0.1.586-9eec1d6   managed-upgrade-operator.v0.1.579-fd1e7aa          Succeeded
openshift-must-gather-operator                     must-gather-operator.v0.1.134-5fad973              must-gather-operator              0.1.134-5fad973   must-gather-operator.v0.1.119-d066240              Succeeded
openshift-operator-lifecycle-manager               packageserver                                      Package Server                    0.17.0                                                               Succeeded
openshift-osd-metrics                              osd-metrics-exporter.v0.1.102-e388934              osd-metrics-exporter              0.1.102-e388934   osd-metrics-exporter.v0.1.99-55bc4fb               Succeeded
openshift-rbac-permissions                         rbac-permissions-operator.v0.1.164-4876c68         rbac-permissions-operator         0.1.164-4876c68   rbac-permissions-operator.v0.1.152-f0cfa43         Succeeded
openshift-splunk-forwarder-operator                splunk-forwarder-operator.v0.1.217-a5cba25         splunk-forwarder-operator         0.1.217-a5cba25   splunk-forwarder-operator.v0.1.212-c4a4681         Succeeded
openshift-velero

After port-fowarding to the olm-operator pod and curling the metrics endpoint

curl -k https://localhost:8081/metrics | grep csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6909    0  6909    0     0  83240      0 --:--:-- --:--:-- --:--:-- 83240
# HELP csv_count Number of CSVs successfully registered
# TYPE csv_count gauge
csv_count 146
# HELP csv_succeeded Successful CSV install
# TYPE csv_succeeded gauge
csv_succeeded{name="configure-alertmanager-operator.v0.1.313-3955894",namespace="openshift-monitoring",version="0.1.313-3955894"} 1
csv_succeeded{name="managed-velero-operator.v0.2.257-e395241",namespace="openshift-velero",version="0.2.257-e395241"} 1
csv_succeeded{name="packageserver",namespace="openshift-operator-lifecycle-manager",version="0.17.0"} 1
# HELP csv_upgrade_count Monotonic count of CSV upgrades
# TYPE csv_upgrade_count counter
csv_upgrade_count 3


Actual results:


Expected results:
The csv_succeeded metric should be present for all successfully installed CSVs.


Additional info:

Comment 1 Arjun Naik 2021-04-22 15:08:54 UTC
Also verified that the missing CSVs are not "Copied"

oc get csv -n openshift-splunk-forwarder-operator splunk-forwarder-operator.v0.1.217-a5cba25 -o json | jq -r '.status.reason'
InstallSucceeded

Comment 2 Ben Luddy 2021-04-22 17:10:26 UTC
> oc get pods olm-operator-d97f6b57-bx4pl -o json | jq -r '.status.startTime' 
> 2021-04-22T06:33:18Z

> oc get csv -n openshift-splunk-forwarder-operator splunk-forwarder-operator.v0.1.217-a5cba25 -o json | jq -r '.status.lastUpdateTime'
> 2021-04-21T09:46:14Z

Could be due to the fact that this metric is only updated when a CSV status changes.

Comment 3 Anik 2021-06-07 15:37:49 UTC
*** Bug 1964716 has been marked as a duplicate of this bug. ***

Comment 10 xzha 2022-01-12 07:02:32 UTC
[root@preserve-olm-agent-test ~]# oc version
Client Version: 4.10.0-0.nightly-2022-01-11-065245
Server Version: 4.10.0-0.nightly-2022-01-11-065245
Kubernetes Version: v1.22.1+6859754
[root@preserve-olm-agent-test ~]# oc exec catalog-operator-67f5bfd4f9-2g79c  -- olm --version
OLM version: 0.19.0
git commit: 79c782526c3c1c2da88f63b34707b23fb04f7da5

1, install some operators 
[root@preserve-olm-agent-test ~]# oc get csv -A | grep -v elasticsearch | grep -v nginx-ingress | grep -v must-gather
NAMESPACE                                          NAME                              DISPLAY                            VERSION    REPLACES                        PHASE
default                                            ditto-operator.v0.3.1             Eclipse Ditto                      0.3.1      ditto-operator.v0.2.0           Succeeded
openshift-logging                                  cluster-logging.5.3.2-26          Red Hat OpenShift Logging          5.3.2-26                                   Succeeded
openshift-operator-lifecycle-manager               packageserver                     Package Server                     0.19.0                                     Succeeded
test-1                                             anzo-operator.v2.0.101            Anzo Operator                      2.0.0                                      Succeeded
test-3                                             cockroachdb.v5.0.4                CockroachDB Helm Operator          5.0.4      cockroachdb.v5.0.3              Succeeded

2, port-fowarding to the olm-operator pod and curling the metrics endpoint

[root@preserve-olm-agent-test ~]# oc get pod
NAME                                      READY   STATUS      RESTARTS   AGE
catalog-operator-67f5bfd4f9-2g79c         1/1     Running     0          128m
collect-profiles-27366135--1-t7rh9        0/1     Completed   0          42m
collect-profiles-27366150--1-27jsg        0/1     Completed   0          27m
collect-profiles-27366165--1-f9n4d        0/1     Completed   0          12m
olm-operator-68bfb9479b-j72b5             1/1     Running     0          128m
package-server-manager-66b87fbcc9-95qt5   1/1     Running     0          128m
packageserver-7566c94648-ldktx            1/1     Running     0          122m
packageserver-7566c94648-sbmn2            1/1     Running     0          122m
[root@preserve-olm-agent-test ~]# oc  port-forward olm-operator-68bfb9479b-j72b5 8443

[root@preserve-olm-agent-test ~]# curl -k https://localhost:8443/metrics | grep csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  7315   # HELP csv_count Number of CSVs successfully registered
 # TYPE csv_count gauge
0  csv_count 8
# HELP csv_succeeded Successful CSV install
7315    0# TYPE csv_succeeded gauge
csv_succeeded{name="anzo-operator.v2.0.101",namespace="test-1",version="2.0.0"} 1
csv_succeeded{name="cluster-logging.5.3.2-26",namespace="openshift-logging",version="5.3.2-26"} 1
   csv_succeeded{name="cockroachdb.v5.0.4",namespace="test-3",version="5.0.4"} 1
csv_succeeded{name="ditto-operator.v0.3.1",namespace="default",version="0.3.1"} 1
  0   csv_succeeded{name="elasticsearch-operator.5.3.2-26",namespace="openshift-operators-redhat",version="5.3.2-26"} 1
csv_succeeded{name="must-gather-operator.v1.1.2",namespace="global-load-balancer-operator",version="1.1.2"} 1
8191csv_succeeded{name="nginx-ingress-operator.v0.4.0",namespace="openshift-operators",version="0.4.0"} 1
  csv_succeeded{name="packageserver",namespace="openshift-operator-lifecycle-manager",version="0.19.0"} 1
    0 # HELP csv_upgrade_count Monotonic count of CSV upgrades
# TYPE csv_upgrade_count counter
--:csv_upgrade_count 0
--:-- --:--:-- --:--:--  8182
[root@preserve-olm-agent-test ~]# 

LGTM, verified.

Comment 13 errata-xmlrpc 2022-03-10 16:03:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.