Bug 2072995

Summary: csv_succeeded metric not present in olm-operator for all successful CSVs
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: OLMAssignee: tflannag
OLM sub component: OLM QA Contact: xzha
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: anbhatta, aos-bugs, apahim, bandrade, bluddy, davegord, dsover, krizza, pegoncal, tflannag
Version: 4.9Keywords: Triaged
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2074680 (view as bug list) Environment:
Last Closed: 2022-04-20 14:49:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1952576    
Bug Blocks: 2074680    

Comment 1 Per da Silva 2022-04-07 12:39:04 UTC
*** Bug 2072923 has been marked as a duplicate of this bug. ***

Comment 2 Per da Silva 2022-04-07 12:40:31 UTC
I'm creating a cherry-pick PR from the 4.10 fix. We should probably initiate some discussion around the metrics, their meaning, and whether they meet your requirements as SRE.
The original intention of the metrics were to provide information for PM. This was never built to be a resilient cluster health status metric.


https://github.com/openshift/operator-framework-olm/pull/280

Comment 3 apahim 2022-04-07 13:03:12 UTC
We should backport it to all supported versions, not only 4.9.

Comment 7 Bruno Andrade 2022-04-11 19:05:29 UTC
oc version  
Client Version: 4.9.0-0.nightly-2022-04-11-10570
Server Version: 4.9.0-0.nightly-2022-04-11-10570

OLM version: 0.18.3
git commit: bbf220a2021fd01eb335a2a4e2a59d2dd2459b87

1, install some operators 

oc get csv -A | grep -v elasticsearch | grep -v nginx-ingress | grep -v must-gather
NAMESPACE                                          NAME                                   DISPLAY                            VERSION    REPLACES                               PHASE
openshift-logging                                  cluster-logging.5.3.6-42               Red Hat OpenShift Logging          5.3.6-42                                          Succeeded
openshift-operator-lifecycle-manager               packageserver                          Package Server                     0.18.3                                            Succeeded
test-1                                             businessautomation-operator.7.12.1-1   Business Automation                7.12.1-1   businessautomation-operator.7.12.0-2   Succeeded
test-2                                             eap-operator.v2.3.0                    JBoss EAP                          2.3.0      eap-operator.v2.2.2                    Succeeded
test-3                                             jws-operator.v1.2.3                    JBoss Web Server Operator          1.2.3      jws-operator.v1.2.3

2, port-fowarding to the olm-operator pod and curling the metrics endpoint

 oc port-forward olm-operator-546b6fb5fd-cg675 8443 -n openshift-operator-lifecycle-manager
Forwarding from 127.0.0.1:8443 -> 8443
Forwarding from [::1]:8443 -> 8443
Handling connection for 8443




curl -k https://localhost:8443/metrics | grep csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7119    0  7119    0     0  11984      0 --:--:# HELP csv_count Number of CSVs successfully registered
--# TYPE csv_count gauge
 csv_count 6
-# HELP csv_succeeded Successful CSV install
-# TYPE csv_succeeded gauge
:-csv_succeeded{name="businessautomation-operator.7.12.1-1",namespace="test-1",version="7.12.1-1"} 1
-:csv_succeeded{name="cluster-logging.5.3.6-42",namespace="openshift-logging",version="5.3.6-42"} 1
--csv_succeeded{name="eap-operator.v2.3.0",namespace="test-2",version="2.3.0"} 1
 --csv_succeeded{name="elasticsearch-operator.5.3.6-42",namespace="openshift-operators-redhat",version="5.3.6-42"} 1
:-csv_succeeded{name="jws-operator.v1.2.3",namespace="test-3",version="1.2.3"} 1
-:-csv_succeeded{name="packageserver",namespace="openshift-operator-lifecycle-manager",version="0.18.3"} 1
-# HELP csv_upgrade_count Monotonic count of CSV upgrades
 11# TYPE csv_upgrade_count counter
9csv_upgrade_count 0
64


LGTM, verified.

Comment 8 apahim 2022-04-11 22:10:44 UTC
Hi bandrade, to fully verify this issue, you have to recycle the OLM pod and check the metrics after the reclicling. Please see https://bugzilla.redhat.com/show_bug.cgi?id=2072923#c0.

Comment 9 Bruno Andrade 2022-04-11 22:31:18 UTC
Hi,

Thanks, I have the same cluster running and I executed the steps that you mentioned in the other bug description:


1) Delete the olm operator pod
oc delete pod olm-operator-546b6fb5fd-cg675 -n openshift-operator-lifecycle-manager                                                                   
pod "olm-operator-546b6fb5fd-cg675" deleted


2) Check that another pod is running:

oc get pods -n openshift-operator-lifecycle-manager                                
NAME                                      READY   STATUS      RESTARTS        AGE
catalog-operator-64d4d4f75b-6nb5d         1/1     Running     0               5h40m
collect-profiles-27495225--1-6gbqs        0/1     Completed   0               31m
collect-profiles-27495240--1-kg55q        0/1     Completed   0               16m
collect-profiles-27495255--1-9fgbz        0/1     Completed   0               79s
olm-operator-546b6fb5fd-fwpl6             1/1     Running     0               15s
package-server-manager-55c798c568-sjnrk   1/1     Running     4 (5h25m ago)   5h40m
packageserver-5bfb8845bf-pk6nm            1/1     Running     0               5h32m
packageserver-5bfb8845bf-rttlg            1/1     Running     0               5h32m

3) port-fowarding to the olm-operator pod and curling the metrics endpoint


oc port-forward olm-operator-546b6fb5fd-fwpl6 8443 -n openshift-operator-lifecycle-manager                                                            
Forwarding from 127.0.0.1:8443 -> 8443
Forwarding from [::1]:8443 -> 8443
Handling connection for 8443


curl -k https://localhost:8443/metrics | grep csv                                                                                                     
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7110    0  7110    0     0  11830      0 --:--:--# HELP csv_count Number of CSVs successfully registered
 --:# TYPE csv_count gauge
--:csv_count 6
--# HELP csv_succeeded Successful CSV install
 # TYPE csv_succeeded gauge
--csv_succeeded{name="businessautomation-operator.7.12.1-1",namespace="test-1",version="7.12.1-1"} 1
:--csv_succeeded{name="cluster-logging.5.3.6-42",namespace="openshift-logging",version="5.3.6-42"} 1
:csv_succeeded{name="eap-operator.v2.3.0",namespace="test-2",version="2.3.0"} 1
-- csv_succeeded{name="elasticsearch-operator.5.3.6-42",namespace="openshift-operators-redhat",version="5.3.6-42"} 1
11csv_succeeded{name="jws-operator.v1.2.3",namespace="test-3",version="1.2.3"} 1
830csv_succeeded{name="packageserver",namespace="openshift-operator-lifecycle-manager",version="0.18.3"} 1

# HELP csv_upgrade_count Monotonic count of CSV upgrades
# TYPE csv_upgrade_count counter
csv_upgrade_count 0

Comment 14 errata-xmlrpc 2022-04-20 14:49:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.29 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1363