Bug 1737156 - Report metrics on installed operators
Summary: Report metrics on installed operators
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.1.z
Assignee: Evan Cordell
QA Contact: Bruno Andrade
URL:
Whiteboard:
Depends On: 1743808
Blocks: 1737164
TreeView+ depends on / blocked
 
Reported: 2019-08-02 21:44 UTC by Evan Cordell
Modified: 2019-09-10 15:59 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1743808 (view as bug list)
Environment:
Last Closed: 2019-09-10 15:59:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 976 0 None closed Bug 1737156: feat(metrics): record sync count for Subscriptions, labeled with name and installedCSV 2020-04-30 01:24:43 UTC
Red Hat Product Errata RHSA-2019:2594 0 None None None 2019-09-10 15:59:38 UTC

Description Evan Cordell 2019-08-02 21:44:52 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jian Zhang 2019-08-05 01:44:48 UTC
Hi, Evan

Could you help give more details about this bug? Such as the basic description, reproduce steps.
And, it's better to fil in that two Priority and Severity fields.

Comment 2 Jian Zhang 2019-08-05 06:20:34 UTC
Bruno,

Please help verify this bug once it in ON_QA status. Basic info: a synchronization metric for the subscription object added in the Prometheus metrics. 
For more details, you can ask for the reporter's help. Thanks!

Comment 3 Bruno Andrade 2019-08-08 04:13:05 UTC
Jian, I'll keep an eye on it.

Evan, from my understanding, the verification steps would be:

1) Check if metrics are available. In order to validate that I should follow [1] and search for '{__name__="subscription_sync_total"}' metrics on Prometheus UI. 

2) Check for metrics count increasing. Try to upgrade an operator and check subscription_sync_total.

Can you please validate that?

[1] https://docs.openshift.com/container-platform/4.1/telemetry/showing-data-collected-by-telemetry.html

Comment 4 Evan Cordell 2019-08-20 16:50:43 UTC
Hi Bruno,

Yes, this is correct. Thanks!

Comment 6 Bruno Andrade 2019-08-26 20:19:15 UTC
Verification Failed

Steps used to validate:

1) Create a subscription for etcd operator on default project 

2) Check subscription_sync_total metrics on catalog operator

oc get pods -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-58cfd7cc84-ktdfh   1/1     Running   0          29m
olm-operator-8999bd5fd-2wc4p        1/1     Running   0          29m
olm-operators-8vbmm                 1/1     Running   0          27m
packageserver-67f857985f-gkgrq      1/1     Running   0          26m
packageserver-67f857985f-tqwd6      1/1     Running   0          26m

oc port-forward catalog-operator-58cfd7cc84-ktdfh 8081  -n openshift-operator-lifecycle-manager
Forwarding from 127.0.0.1:8081 -> 8081
Forwarding from [::1]:8081 -> 8081
Handling connection for 8081

curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)" https://localhost:8081/metrics | grep subs
# HELP subscription_count Number of subscriptions
# TYPE subscription_count gauge
subscription_count 2.0

As shown above, the 'subscription_sync_total' does not appear on Prometheus UI and on Catalog Operator metrics.

Cluster Details:

	Cluster Version:
	oc get clusterversion -o json|jq ".items[0].status.history[0].version"
		"4.1.0-0.nightly-2019-08-26-164941"

	OLM Version:
        oc exec catalog-operator-58cfd7cc84-ktdfh -n openshift-operator-lifecycle-manager -- olm -version
        OLM version: 0.9.0
        git commit: afc7402

Comment 9 Bruno Andrade 2019-08-27 18:56:07 UTC
LGTM, marking as VERIFIED

Steps used to validate:

1) Create a subscription for etcd operator on default project 

2) Check subscription_sync_total metrics on catalog operator

 oc get pods -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-5f68dfb696-kk6vr   1/1     Running   1          20m
olm-operator-588cb66f54-m8h59       1/1     Running   1          20m
olm-operators-ph5nk                 1/1     Running   0          17m
packageserver-54f9598d56-2zqpw      1/1     Running   0          16m
packageserver-54f9598d56-rvclf      1/1     Running   0          16m


oc port-forward catalog-operator-5f68dfb696-kk6vr 8081  -n openshift-operator-lifecycle-manager
Forwarding from 127.0.0.1:8081 -> 8081
Forwarding from [::1]:8081 -> 8081
Handling connection for 8081

curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)" https://localhost:8081/metrics | grep subs
# HELP subscription_count Number of subscriptions
# TYPE subscription_count gauge
subscription_count 2.0
# HELP subscription_sync_total Monotonic count of subscription syncs
# TYPE subscription_sync_total counter
subscription_sync_total{installed="",name="etcd"} 2.0
subscription_sync_total{installed="etcdoperator.v0.9.4",name="etcd"} 2.0
subscription_sync_total{installed="packageserver.v0.9.0",name="packageserver"} 2.0

3) Query for {__name__="subscription_sync_total"} on Prometheus UI and checked that all metrics are shown:
http://pics.osci.redhat.com/5chjce.png


Cluster Details:

	Cluster Version:
	oc get clusterversion -o json|jq ".items[0].status.history[0].version"
		"4.1.0-0.nightly-2019-08-27-070548"

	OLM Version:
	oc exec catalog-operator-5f68dfb696-kk6vr -n openshift-operator-lifecycle-manager -- olm -version
	OLM version: 0.9.0
	git commit: b28fc94

Comment 11 errata-xmlrpc 2019-09-10 15:59:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2594


Note You need to log in before you can comment on or make changes to this bug.