*** Bug 2072923 has been marked as a duplicate of this bug. ***
I'm creating a cherry-pick PR from the 4.10 fix. We should probably initiate some discussion around the metrics, their meaning, and whether they meet your requirements as SRE. The original intention of the metrics were to provide information for PM. This was never built to be a resilient cluster health status metric. https://github.com/openshift/operator-framework-olm/pull/280
We should backport it to all supported versions, not only 4.9.
oc version Client Version: 4.9.0-0.nightly-2022-04-11-10570 Server Version: 4.9.0-0.nightly-2022-04-11-10570 OLM version: 0.18.3 git commit: bbf220a2021fd01eb335a2a4e2a59d2dd2459b87 1, install some operators oc get csv -A | grep -v elasticsearch | grep -v nginx-ingress | grep -v must-gather NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-logging cluster-logging.5.3.6-42 Red Hat OpenShift Logging 5.3.6-42 Succeeded openshift-operator-lifecycle-manager packageserver Package Server 0.18.3 Succeeded test-1 businessautomation-operator.7.12.1-1 Business Automation 7.12.1-1 businessautomation-operator.7.12.0-2 Succeeded test-2 eap-operator.v2.3.0 JBoss EAP 2.3.0 eap-operator.v2.2.2 Succeeded test-3 jws-operator.v1.2.3 JBoss Web Server Operator 1.2.3 jws-operator.v1.2.3 2, port-fowarding to the olm-operator pod and curling the metrics endpoint oc port-forward olm-operator-546b6fb5fd-cg675 8443 -n openshift-operator-lifecycle-manager Forwarding from 127.0.0.1:8443 -> 8443 Forwarding from [::1]:8443 -> 8443 Handling connection for 8443 curl -k https://localhost:8443/metrics | grep csv % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 7119 0 7119 0 0 11984 0 --:--:# HELP csv_count Number of CSVs successfully registered --# TYPE csv_count gauge csv_count 6 -# HELP csv_succeeded Successful CSV install -# TYPE csv_succeeded gauge :-csv_succeeded{name="businessautomation-operator.7.12.1-1",namespace="test-1",version="7.12.1-1"} 1 -:csv_succeeded{name="cluster-logging.5.3.6-42",namespace="openshift-logging",version="5.3.6-42"} 1 --csv_succeeded{name="eap-operator.v2.3.0",namespace="test-2",version="2.3.0"} 1 --csv_succeeded{name="elasticsearch-operator.5.3.6-42",namespace="openshift-operators-redhat",version="5.3.6-42"} 1 :-csv_succeeded{name="jws-operator.v1.2.3",namespace="test-3",version="1.2.3"} 1 -:-csv_succeeded{name="packageserver",namespace="openshift-operator-lifecycle-manager",version="0.18.3"} 1 -# HELP csv_upgrade_count Monotonic count of CSV upgrades 11# TYPE csv_upgrade_count counter 9csv_upgrade_count 0 64 LGTM, verified.
Hi bandrade, to fully verify this issue, you have to recycle the OLM pod and check the metrics after the reclicling. Please see https://bugzilla.redhat.com/show_bug.cgi?id=2072923#c0.
Hi, Thanks, I have the same cluster running and I executed the steps that you mentioned in the other bug description: 1) Delete the olm operator pod oc delete pod olm-operator-546b6fb5fd-cg675 -n openshift-operator-lifecycle-manager pod "olm-operator-546b6fb5fd-cg675" deleted 2) Check that another pod is running: oc get pods -n openshift-operator-lifecycle-manager NAME READY STATUS RESTARTS AGE catalog-operator-64d4d4f75b-6nb5d 1/1 Running 0 5h40m collect-profiles-27495225--1-6gbqs 0/1 Completed 0 31m collect-profiles-27495240--1-kg55q 0/1 Completed 0 16m collect-profiles-27495255--1-9fgbz 0/1 Completed 0 79s olm-operator-546b6fb5fd-fwpl6 1/1 Running 0 15s package-server-manager-55c798c568-sjnrk 1/1 Running 4 (5h25m ago) 5h40m packageserver-5bfb8845bf-pk6nm 1/1 Running 0 5h32m packageserver-5bfb8845bf-rttlg 1/1 Running 0 5h32m 3) port-fowarding to the olm-operator pod and curling the metrics endpoint oc port-forward olm-operator-546b6fb5fd-fwpl6 8443 -n openshift-operator-lifecycle-manager Forwarding from 127.0.0.1:8443 -> 8443 Forwarding from [::1]:8443 -> 8443 Handling connection for 8443 curl -k https://localhost:8443/metrics | grep csv % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 7110 0 7110 0 0 11830 0 --:--:--# HELP csv_count Number of CSVs successfully registered --:# TYPE csv_count gauge --:csv_count 6 --# HELP csv_succeeded Successful CSV install # TYPE csv_succeeded gauge --csv_succeeded{name="businessautomation-operator.7.12.1-1",namespace="test-1",version="7.12.1-1"} 1 :--csv_succeeded{name="cluster-logging.5.3.6-42",namespace="openshift-logging",version="5.3.6-42"} 1 :csv_succeeded{name="eap-operator.v2.3.0",namespace="test-2",version="2.3.0"} 1 -- csv_succeeded{name="elasticsearch-operator.5.3.6-42",namespace="openshift-operators-redhat",version="5.3.6-42"} 1 11csv_succeeded{name="jws-operator.v1.2.3",namespace="test-3",version="1.2.3"} 1 830csv_succeeded{name="packageserver",namespace="openshift-operator-lifecycle-manager",version="0.18.3"} 1 # HELP csv_upgrade_count Monotonic count of CSV upgrades # TYPE csv_upgrade_count counter csv_upgrade_count 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.29 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1363