Bug 2072995 - csv_succeeded metric not present in olm-operator for all successful CSVs
Summary: csv_succeeded metric not present in olm-operator for all successful CSVs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.9
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 4.9.z
Assignee: tflannag
QA Contact: xzha
URL:
Whiteboard:
: 2072923 (view as bug list)
Depends On: 1952576
Blocks: 2074680
TreeView+ depends on / blocked
 
Reported: 2022-04-07 12:34 UTC by OpenShift BugZilla Robot
Modified: 2022-05-03 21:15 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2074680 (view as bug list)
Environment:
Last Closed: 2022-04-20 14:49:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift operator-framework-olm pull 280 0 None open [release-4.9] Bug 2072995: Emit CSV metric on startup 2022-04-07 12:35:01 UTC
Red Hat Product Errata RHSA-2022:1363 0 None None None 2022-04-20 14:50:03 UTC

Comment 1 Per da Silva 2022-04-07 12:39:04 UTC
*** Bug 2072923 has been marked as a duplicate of this bug. ***

Comment 2 Per da Silva 2022-04-07 12:40:31 UTC
I'm creating a cherry-pick PR from the 4.10 fix. We should probably initiate some discussion around the metrics, their meaning, and whether they meet your requirements as SRE.
The original intention of the metrics were to provide information for PM. This was never built to be a resilient cluster health status metric.


https://github.com/openshift/operator-framework-olm/pull/280

Comment 3 apahim 2022-04-07 13:03:12 UTC
We should backport it to all supported versions, not only 4.9.

Comment 7 Bruno Andrade 2022-04-11 19:05:29 UTC
oc version  
Client Version: 4.9.0-0.nightly-2022-04-11-10570
Server Version: 4.9.0-0.nightly-2022-04-11-10570

OLM version: 0.18.3
git commit: bbf220a2021fd01eb335a2a4e2a59d2dd2459b87

1, install some operators 

oc get csv -A | grep -v elasticsearch | grep -v nginx-ingress | grep -v must-gather
NAMESPACE                                          NAME                                   DISPLAY                            VERSION    REPLACES                               PHASE
openshift-logging                                  cluster-logging.5.3.6-42               Red Hat OpenShift Logging          5.3.6-42                                          Succeeded
openshift-operator-lifecycle-manager               packageserver                          Package Server                     0.18.3                                            Succeeded
test-1                                             businessautomation-operator.7.12.1-1   Business Automation                7.12.1-1   businessautomation-operator.7.12.0-2   Succeeded
test-2                                             eap-operator.v2.3.0                    JBoss EAP                          2.3.0      eap-operator.v2.2.2                    Succeeded
test-3                                             jws-operator.v1.2.3                    JBoss Web Server Operator          1.2.3      jws-operator.v1.2.3

2, port-fowarding to the olm-operator pod and curling the metrics endpoint

 oc port-forward olm-operator-546b6fb5fd-cg675 8443 -n openshift-operator-lifecycle-manager
Forwarding from 127.0.0.1:8443 -> 8443
Forwarding from [::1]:8443 -> 8443
Handling connection for 8443




curl -k https://localhost:8443/metrics | grep csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7119    0  7119    0     0  11984      0 --:--:# HELP csv_count Number of CSVs successfully registered
--# TYPE csv_count gauge
 csv_count 6
-# HELP csv_succeeded Successful CSV install
-# TYPE csv_succeeded gauge
:-csv_succeeded{name="businessautomation-operator.7.12.1-1",namespace="test-1",version="7.12.1-1"} 1
-:csv_succeeded{name="cluster-logging.5.3.6-42",namespace="openshift-logging",version="5.3.6-42"} 1
--csv_succeeded{name="eap-operator.v2.3.0",namespace="test-2",version="2.3.0"} 1
 --csv_succeeded{name="elasticsearch-operator.5.3.6-42",namespace="openshift-operators-redhat",version="5.3.6-42"} 1
:-csv_succeeded{name="jws-operator.v1.2.3",namespace="test-3",version="1.2.3"} 1
-:-csv_succeeded{name="packageserver",namespace="openshift-operator-lifecycle-manager",version="0.18.3"} 1
-# HELP csv_upgrade_count Monotonic count of CSV upgrades
 11# TYPE csv_upgrade_count counter
9csv_upgrade_count 0
64


LGTM, verified.

Comment 8 apahim 2022-04-11 22:10:44 UTC
Hi bandrade, to fully verify this issue, you have to recycle the OLM pod and check the metrics after the reclicling. Please see https://bugzilla.redhat.com/show_bug.cgi?id=2072923#c0.

Comment 9 Bruno Andrade 2022-04-11 22:31:18 UTC
Hi,

Thanks, I have the same cluster running and I executed the steps that you mentioned in the other bug description:


1) Delete the olm operator pod
oc delete pod olm-operator-546b6fb5fd-cg675 -n openshift-operator-lifecycle-manager                                                                   
pod "olm-operator-546b6fb5fd-cg675" deleted


2) Check that another pod is running:

oc get pods -n openshift-operator-lifecycle-manager                                
NAME                                      READY   STATUS      RESTARTS        AGE
catalog-operator-64d4d4f75b-6nb5d         1/1     Running     0               5h40m
collect-profiles-27495225--1-6gbqs        0/1     Completed   0               31m
collect-profiles-27495240--1-kg55q        0/1     Completed   0               16m
collect-profiles-27495255--1-9fgbz        0/1     Completed   0               79s
olm-operator-546b6fb5fd-fwpl6             1/1     Running     0               15s
package-server-manager-55c798c568-sjnrk   1/1     Running     4 (5h25m ago)   5h40m
packageserver-5bfb8845bf-pk6nm            1/1     Running     0               5h32m
packageserver-5bfb8845bf-rttlg            1/1     Running     0               5h32m

3) port-fowarding to the olm-operator pod and curling the metrics endpoint


oc port-forward olm-operator-546b6fb5fd-fwpl6 8443 -n openshift-operator-lifecycle-manager                                                            
Forwarding from 127.0.0.1:8443 -> 8443
Forwarding from [::1]:8443 -> 8443
Handling connection for 8443


curl -k https://localhost:8443/metrics | grep csv                                                                                                     
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7110    0  7110    0     0  11830      0 --:--:--# HELP csv_count Number of CSVs successfully registered
 --:# TYPE csv_count gauge
--:csv_count 6
--# HELP csv_succeeded Successful CSV install
 # TYPE csv_succeeded gauge
--csv_succeeded{name="businessautomation-operator.7.12.1-1",namespace="test-1",version="7.12.1-1"} 1
:--csv_succeeded{name="cluster-logging.5.3.6-42",namespace="openshift-logging",version="5.3.6-42"} 1
:csv_succeeded{name="eap-operator.v2.3.0",namespace="test-2",version="2.3.0"} 1
-- csv_succeeded{name="elasticsearch-operator.5.3.6-42",namespace="openshift-operators-redhat",version="5.3.6-42"} 1
11csv_succeeded{name="jws-operator.v1.2.3",namespace="test-3",version="1.2.3"} 1
830csv_succeeded{name="packageserver",namespace="openshift-operator-lifecycle-manager",version="0.18.3"} 1

# HELP csv_upgrade_count Monotonic count of CSV upgrades
# TYPE csv_upgrade_count counter
csv_upgrade_count 0

Comment 14 errata-xmlrpc 2022-04-20 14:49:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.29 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1363


Note You need to log in before you can comment on or make changes to this bug.