Bug 1743808 - Report metrics on installed operators
Summary: Report metrics on installed operators
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.2.0
Assignee: Evan Cordell
QA Contact: Bruno Andrade
URL:
Whiteboard:
Depends On:
Blocks: 1737156
TreeView+ depends on / blocked
 
Reported: 2019-08-20 17:56 UTC by Evan Cordell
Modified: 2019-10-16 06:36 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1737156
Environment:
Last Closed: 2019-10-16 06:36:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 951 0 None closed feat(metrics): record sync count for Subscriptions, labeled with name and installedCSV 2020-04-01 07:41:48 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:36:53 UTC

Description Evan Cordell 2019-08-20 17:56:05 UTC
+++ This bug was initially created as a clone of Bug #1737156 +++

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Jian Zhang on 2019-08-05 01:44:48 UTC ---

Hi, Evan

Could you help give more details about this bug? Such as the basic description, reproduce steps.
And, it's better to fil in that two Priority and Severity fields.

--- Additional comment from Jian Zhang on 2019-08-05 06:20:34 UTC ---

Bruno,

Please help verify this bug once it in ON_QA status. Basic info: a synchronization metric for the subscription object added in the Prometheus metrics. 
For more details, you can ask for the reporter's help. Thanks!

--- Additional comment from Bruno Andrade on 2019-08-08 04:13:05 UTC ---

Jian, I'll keep an eye on it.

Evan, from my understanding, the verification steps would be:

1) Check if metrics are available. In order to validate that I should follow [1] and search for '{__name__="subscription_sync_total"}' metrics on Prometheus UI. 

2) Check for metrics count increasing. Try to upgrade an operator and check subscription_sync_total.

Can you please validate that?

[1] https://docs.openshift.com/container-platform/4.1/telemetry/showing-data-collected-by-telemetry.html

--- Additional comment from Evan Cordell on 2019-08-20 16:50:43 UTC ---

Hi Bruno,

Yes, this is correct. Thanks!

Comment 2 Bruno Andrade 2019-08-21 16:19:44 UTC
LGTM, marking as VERIFIED

Steps used to validate:

1) Create a subscription for etcd operator on default project 

2) Check subscription_sync_total metrics on catalog operator:

oc get pods -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-7bd48b5c85-lhzk2   1/1     Running   0          79m
olm-operator-86c74cd669-dqvml       1/1     Running   0          79m
packageserver-67dc576656-mlg6c      1/1     Running   0          77m
packageserver-67dc576656-t5j55      1/1     Running   0          77m

oc port-forward catalog-operator-7bd48b5c85-lhzk2 8081  -n openshift-operator-lifecycle-manager
Forwarding from 127.0.0.1:8081 -> 8081
Forwarding from [::1]:8081 -> 8081
Handling connection for 8081


curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)" https://localhost:8081/metrics | grep subs
# HELP subscription_count Number of subscriptions
# TYPE subscription_count gauge
subscription_count 1.0
# HELP subscription_sync_total Monotonic count of subscription syncs
# TYPE subscription_sync_total counter
subscription_sync_total{installed="",name="etcd"} 10.0
subscription_sync_total{installed="etcdoperator.v0.9.4",name="etcd"} 3.0

3) Query for '{__name__="subscription_sync_total"}' metrics on Prometheus UI. 

http://pics.osci.redhat.com/w6keoq.png


Cluster Details:

	Cluster Version:
	oc get clusterversion -o json|jq ".items[0].status.history[0].version"
		"4.2.0-0.nightly-2019-08-21-115505"

	OLM Version:
	oc exec catalog-operator-7bd48b5c85-lhzk2 -n openshift-operator-lifecycle-manager -- olm -version
		OLM version: 0.11.0
		git commit: 772aaa018dd9d7fd5f8940e997fb91bbdf10b527

Comment 3 errata-xmlrpc 2019-10-16 06:36:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.