Bug 1757547 - Metering fails to import pod cpu/memory usage metrics due to many-to-many matching error
Summary: Metering fails to import pod cpu/memory usage metrics due to many-to-many mat...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Metering Operator
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.3.0
Assignee: Brett Tofel
QA Contact: Peter Ruan
URL:
Whiteboard:
Depends On:
Blocks: 1757551
TreeView+ depends on / blocked
 
Reported: 2019-10-01 19:26 UTC by Chance Zibolski
Modified: 2020-05-13 21:26 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1757551 (view as bug list)
Environment:
Last Closed: 2020-05-13 21:26:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-metering pull 958 0 'None' closed bug 1757547: charts/openshift-metering: Update cpu/memory usage promqueries to handle many to many matching error 2020-08-20 22:14:17 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-05-13 21:26:23 UTC

Description Chance Zibolski 2019-10-01 19:26:47 UTC
Description of problem: The following error sometimes occurs when metering is importing CPU or memory usage metrics in the pod-usage-cpu-cores or pod-usage-memory-bytes ReportDataSources.

time="2019-10-01T19:16:51Z" level=error msg="error collecting metrics" app=metering chunkSize=5m0s component=PrometheusImporter endTime="2019-10-01 19:16:39.238555324 +0000 UTC" error="failed to perform Prometheus query: execution: many-to-many matching not allowed: matching labels must be unique on one side" logID=OBju7Ykcm2 namespace=metering-chancez2 reportDataSource=pod-usage-cpu-cores startTime="2019-09-23 23:12:00 +0000 UTC" stepSize=1m0s tableName=hive.metering.datasource_metering_chancez2_pod_usage_cpu_cores


Version-Release number of selected component (if applicable): 4.3.0


How reproducible: This seems to depend heavily on the metrics in Prometheus. I cannot determine what part of the query causes it yet, but it's caused by a group_left from the container usage metrics to the kube_pod_info metrics. 


Steps to Reproduce: Unknown

Actual results: Failed to perform Prometheus query, thus no metrics imported


Expected results: Promtheus queries in ReportDataSources do not error when doing group_left, and metrics are imported.


Additional info:

Comment 1 Chance Zibolski 2019-10-01 19:34:16 UTC
I wasn't able to determine the exact underlying issue but found a suitable fix by removing the group_left that we used for getting the node name for a pod since the container level metrics have the node as a label already.

Comment 4 Peter Ruan 2019-10-22 21:53:19 UTC
verified with 4.3.0-0.nightly-2019-10-22-101148 and `metering` master branch

Comment 7 errata-xmlrpc 2020-05-13 21:26:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.