Bug 1743808

Summary: Report metrics on installed operators
Product: OpenShift Container Platform Reporter: Evan Cordell <ecordell>
Component: OLMAssignee: Evan Cordell <ecordell>
OLM sub component: OLM QA Contact: Bruno Andrade <bandrade>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: bandrade, chezhang, chuo, jfan, scolange
Version: 4.2.0   
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1737156 Environment:
Last Closed: 2019-10-16 06:36:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1737156    

Description Evan Cordell 2019-08-20 17:56:05 UTC
+++ This bug was initially created as a clone of Bug #1737156 +++

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Jian Zhang on 2019-08-05 01:44:48 UTC ---

Hi, Evan

Could you help give more details about this bug? Such as the basic description, reproduce steps.
And, it's better to fil in that two Priority and Severity fields.

--- Additional comment from Jian Zhang on 2019-08-05 06:20:34 UTC ---

Bruno,

Please help verify this bug once it in ON_QA status. Basic info: a synchronization metric for the subscription object added in the Prometheus metrics. 
For more details, you can ask for the reporter's help. Thanks!

--- Additional comment from Bruno Andrade on 2019-08-08 04:13:05 UTC ---

Jian, I'll keep an eye on it.

Evan, from my understanding, the verification steps would be:

1) Check if metrics are available. In order to validate that I should follow [1] and search for '{__name__="subscription_sync_total"}' metrics on Prometheus UI. 

2) Check for metrics count increasing. Try to upgrade an operator and check subscription_sync_total.

Can you please validate that?

[1] https://docs.openshift.com/container-platform/4.1/telemetry/showing-data-collected-by-telemetry.html

--- Additional comment from Evan Cordell on 2019-08-20 16:50:43 UTC ---

Hi Bruno,

Yes, this is correct. Thanks!

Comment 2 Bruno Andrade 2019-08-21 16:19:44 UTC
LGTM, marking as VERIFIED

Steps used to validate:

1) Create a subscription for etcd operator on default project 

2) Check subscription_sync_total metrics on catalog operator:

oc get pods -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-7bd48b5c85-lhzk2   1/1     Running   0          79m
olm-operator-86c74cd669-dqvml       1/1     Running   0          79m
packageserver-67dc576656-mlg6c      1/1     Running   0          77m
packageserver-67dc576656-t5j55      1/1     Running   0          77m

oc port-forward catalog-operator-7bd48b5c85-lhzk2 8081  -n openshift-operator-lifecycle-manager
Forwarding from 127.0.0.1:8081 -> 8081
Forwarding from [::1]:8081 -> 8081
Handling connection for 8081


curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)" https://localhost:8081/metrics | grep subs
# HELP subscription_count Number of subscriptions
# TYPE subscription_count gauge
subscription_count 1.0
# HELP subscription_sync_total Monotonic count of subscription syncs
# TYPE subscription_sync_total counter
subscription_sync_total{installed="",name="etcd"} 10.0
subscription_sync_total{installed="etcdoperator.v0.9.4",name="etcd"} 3.0

3) Query for '{__name__="subscription_sync_total"}' metrics on Prometheus UI. 

http://pics.osci.redhat.com/w6keoq.png


Cluster Details:

	Cluster Version:
	oc get clusterversion -o json|jq ".items[0].status.history[0].version"
		"4.2.0-0.nightly-2019-08-21-115505"

	OLM Version:
	oc exec catalog-operator-7bd48b5c85-lhzk2 -n openshift-operator-lifecycle-manager -- olm -version
		OLM version: 0.11.0
		git commit: 772aaa018dd9d7fd5f8940e997fb91bbdf10b527

Comment 3 errata-xmlrpc 2019-10-16 06:36:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922