Bug 1768483 - [marketplace] Default OpSrc metrics cardinality is too great
Summary: [marketplace] Default OpSrc metrics cardinality is too great
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.3.0
Assignee: Alexander Greene
QA Contact: Fan Jia
URL:
Whiteboard:
: 1768482 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-04 14:46 UTC by Alexander Greene
Modified: 2020-01-23 11:10 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:10:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-marketplace pull 259 0 'None' closed Bug 1768483: Limit Cardinality of Marketplace quay metrics 2020-04-01 07:27:29 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:10:40 UTC

Description Alexander Greene 2019-11-04 14:46:51 UTC
Description of problem:
The marketplace operator is configured to expose metrics when it attempts to connect to a default AppRegisrty. Given that each http response code can create a new time series, the telemeter team has requested that we limit the potential number of time series by using a recording rule and exposing that metric to telemter instead.


Version-Release number of selected component (if applicable): 4.3.x

How reproducible: Always

Steps to Reproduce:
1. Create an OpenShift 4.3.x cluster
2. Visit the /metrics endpoint on the marketplace-operator

Actual results:
Each time series tracks a response code and are not grouped by a recording rule.

Example:
app_registry_request_total{code="200",opsrc="community-operators"} 1


Expected results:

app_registry_request_total{code="200",opsrc="community-operators"} 1
app_registry:community_operators:1xx_response 0
app_registry:community_operators:2xx_response 1
app_registry:community_operators:3xx_response 0
app_registry:community_operators:4xx_response 0
app_registry:community_operators:5xx_response 0

Additional info:

Comment 1 Alexander Greene 2019-11-05 18:25:21 UTC
*** Bug 1768482 has been marked as a duplicate of this bug. ***

Comment 3 Fan Jia 2019-11-08 07:38:43 UTC
test env:
4.3.0-0.nightly-2019-11-07-172437

test result:
the metrics of marektplace-operators do not have "app_registry:community_operators:2xx_response" 
#oc -n openshift-monitoring exec -c prometheus prometheus-k8s-1  -- curl -k -H "Authorization: Bearer $token" http://10.128.0.26:8383/metrics | grep app_

`
# HELP app_registry_request_duration_seconds A histogram of AppRegistry request latencies.
# TYPE app_registry_request_duration_seconds histogram
app_registry_request_duration_seconds_bucket{opsrc="certified-operators",le="0.005"} 0
app_registry_request_duration_seconds_bucket{opsrc="certified-operators",le="0.01"} 0
app_registry_request_duration_seconds_bucket{opsrc="certified-operators",le="0.025"} 0
app_registry_request_duration_seconds_bucket{opsrc="certified-operators",le="0.05"} 0
app_registry_request_duration_seconds_bucket{opsrc="certified-operators",le="0.1"} 0
app_registry_request_duration_seconds_bucket{opsrc="certified-operators",le="0.25"} 0
app_registry_request_duration_seconds_bucket{opsrc="certified-operators",le="0.5"} 4
app_registry_request_duration_seconds_bucket{opsrc="certified-operators",le="1"} 6
app_registry_request_duration_seconds_bucket{opsrc="certified-operators",le="2.5"} 6
app_registry_request_duration_seconds_bucket{opsrc="certified-operators",le="5"} 6
app_registry_request_duration_seconds_bucket{opsrc="certified-operators",le="10"} 6
app_registry_request_duration_seconds_bucket{opsrc="certified-operators",le="+Inf"} 6
app_registry_request_duration_seconds_sum{opsrc="certified-operators"} 2.8043666599999995
app_registry_request_duration_seconds_count{opsrc="certified-operators"} 6
app_registry_request_duration_seconds_bucket{opsrc="community-operators",le="0.005"} 0
app_registry_request_duration_seconds_bucket{opsrc="community-operators",le="0.01"} 0
app_registry_request_duration_seconds_bucket{opsrc="community-operators",le="0.025"} 0
app_registry_request_duration_seconds_bucket{opsrc="community-operators",le="0.05"} 0
app_registry_request_duration_seconds_bucket{opsrc="community-operators",le="0.1"} 0
app_registry_request_duration_seconds_bucket{opsrc="community-operators",le="0.25"} 0
app_registry_request_duration_seconds_bucket{opsrc="community-operators",le="0.5"} 1
app_registry_request_duration_seconds_bucket{opsrc="community-operators",le="1"} 6
app_registry_request_duration_seconds_bucket{opsrc="community-operators",le="2.5"} 6
app_registry_request_duration_seconds_bucket{opsrc="community-operators",le="5"} 6
app_registry_request_duration_seconds_bucket{opsrc="community-operators",le="10"} 6
app_registry_request_duration_seconds_bucket{opsrc="community-operators",le="+Inf"} 6
app_registry_request_duration_seconds_sum{opsrc="community-operators"} 3.6850265260000006
app_registry_request_duration_seconds_count{opsrc="community-operators"} 6
app_registry_request_duration_seconds_bucket{opsrc="non-default-opsrc",le="0.005"} 0
app_registry_request_duration_seconds_bucket{opsrc="non-default-opsrc",le="0.01"} 0
app_registry_request_duration_seconds_bucket{opsrc="non-default-opsrc",le="0.025"} 0
app_registry_request_duration_seconds_bucket{opsrc="non-default-opsrc",le="0.05"} 0
app_registry_request_duration_seconds_bucket{opsrc="non-default-opsrc",le="0.1"} 0
app_registry_request_duration_seconds_bucket{opsrc="non-default-opsrc",le="0.25"} 0
app_registry_request_duration_seconds_bucket{opsrc="non-default-opsrc",le="0.5"} 2
app_registry_request_duration_seconds_bucket{opsrc="non-default-opsrc",le="1"} 7
app_registry_request_duration_seconds_bucket{opsrc="non-default-opsrc",le="2.5"} 7
app_registry_request_duration_seconds_bucket{opsrc="non-default-opsrc",le="5"} 7
app_registry_request_duration_seconds_bucket{opsrc="non-default-opsrc",le="10"} 7
app_registry_request_duration_seconds_bucket{opsrc="non-default-opsrc",le="+Inf"} 7
app_registry_request_duration_seconds_sum{opsrc="non-default-opsrc"} 3.648026931
app_registry_request_duration_seconds_count{opsrc="non-default-opsrc"} 7
app_registry_request_duration_seconds_bucket{opsrc="redhat-operators",le="0.005"} 0
app_registry_request_duration_seconds_bucket{opsrc="redhat-operators",le="0.01"} 0
app_registry_request_duration_seconds_bucket{opsrc="redhat-operators",le="0.025"} 0
app_registry_request_duration_seconds_bucket{opsrc="redhat-operators",le="0.05"} 0
app_registry_request_duration_seconds_bucket{opsrc="redhat-operators",le="0.1"} 0
app_registry_request_duration_seconds_bucket{opsrc="redhat-operators",le="0.25"} 1
app_registry_request_duration_seconds_bucket{opsrc="redhat-operators",le="0.5"} 6
app_registry_request_duration_seconds_bucket{opsrc="redhat-operators",le="1"} 6
app_registry_request_duration_seconds_bucket{opsrc="redhat-operators",le="2.5"} 6
app_registry_request_duration_seconds_bucket{opsrc="redhat-operators",le="5"} 6
app_registry_request_duration_seconds_bucket{opsrc="redhat-operators",le="10"} 6
app_registry_request_duration_seconds_bucket{opsrc="redhat-operators",le="+Inf"} 6
app_registry_request_duration_seconds_sum{opsrc="redhat-operators"} 1.765426227
app_registry_request_duration_seconds_count{opsrc="redhat-operators"} 6
# HELP app_registry_request_total A counter that stores the results of reaching out to an AppRegistry.
# TYPE app_registry_request_total counter
app_registry_request_total{code="200",method="get",opsrc="certified-operators"} 6
app_registry_request_total{code="200",method="get",opsrc="community-operators"} 6
app_registry_request_total{code="200",method="get",opsrc="non-default-opsrc"} 7
app_registry_request_total{code="200",method="get",opsrc="redhat-operators"} 6

`

Comment 4 Alexander Greene 2019-11-08 11:10:46 UTC
My apologizes- I should not have written that the metrics are available at the marketplace metrics endpoint.

The time series are in fact available in the console UI by clicking on the Monitoring -> Metrics tab and searching for `app_registry:community_operators:2xx_response`.

Note - the `app_registry:community_operators:xxx_response` will only be available if the reported value is greater or equal to 1.

Comment 5 Fan Jia 2019-11-11 05:56:37 UTC
test env:
4.3.0-0.nightly-2019-11-10-165138

test result:
the metrics of default opsrc added in the console UI by clicking on the Monitoring -> Metrics
"app_registry:community_operators:2xx_response" 
"app_registry:redhat_operators:2xx_response" 
"app_registry:certify_operators:2xx_response"

Comment 7 errata-xmlrpc 2020-01-23 11:10:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.