Bug 1757159 - Metering operator using old container metric labels for container cpu/memory usage metrics in 4.2
Summary: Metering operator using old container metric labels for container cpu/memory ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Metering Operator
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.2.z
Assignee: Chance Zibolski
QA Contact: Peter Ruan
URL:
Whiteboard:
Depends On: 1756548
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-30 17:35 UTC by Chance Zibolski
Modified: 2019-11-19 13:49 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1756548
Environment:
Last Closed: 2019-11-19 13:49:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-metering pull 962 0 'None' 'closed' '[release-4.2] bug 1757159: charts/openshift-metering: Fix kube 1.14 metrics queries' 2019-11-18 11:37:10 UTC
Red Hat Product Errata RHBA-2019:3869 0 None None None 2019-11-19 13:49:14 UTC

Description Chance Zibolski 2019-09-30 17:35:10 UTC
+++ This bug was initially created as a clone of Bug #1756548 +++

Description of problem: The metering reporting-operator is using container_name instead of container for metrics labels in Prometheus queries on pod memory/cpu usage. These are deprecated in 1.14 and removed in 1.16 which will be Openshift 4.3. Because metering may be running against a 4.3 cluster prior to being upgraded, it's possible we'll be using the deprecated metrics labels in 4.3 until our operator is upgraded. This would mean metering is broken on the 4.2 to 4.3 upgrade unless we backport this fix to 4.2 ensuring we use the kube 1.16 metrics labels.


Version-Release number of selected component (if applicable): 4.2.x


How reproducible: Always


Steps to Reproduce:
1. Querying Prometheus directly shows the same behavior. 

                  sum(rate(container_cpu_usage_seconds_total{container_name!="POD",container_name!="",pod!=""}[1m])) BY (pod, namespace) + on (pod, namespace) group_left(node) (sum(kube_pod_info{pod_ip!="",node!="",host_ip!=""}) by (pod, namespace, node) * 0)
and

sum(container_memory_usage_bytes{container_name!="POD", container_name!="",pod!=""}) by (pod, namespace) + on (pod, namespace) group_left(node) (sum(kube_pod_info{pod_ip!="",node!="",host_ip!=""}) by (pod, namespace, node) * 0)

both return no metrics in 4.3, but work in 4.2 and 4.1

After investigation it's because the container_name metric label changed to container in Kube 1.14, and in 1.16 the old metric labels such as container_name and pod_name were removed. We need to update our metrics queries to use container instead of container_name.

Comment 1 Chance Zibolski 2019-09-30 17:45:15 UTC
Once https://github.com/operator-framework/operator-metering/pull/960 is cherry-picked into release-4.2 and built, the verification steps will be different from 4.3. In 4.2 this isn't breaking anything, so we need to just verify the ReportDataSources pod-usage-cpu-cores and pod-usage-memory-bytes no longer reference container_name in their Prometheus query, and instead use "container". Also, metrics should be importing for both these dataSources.

Comment 4 Peter Ruan 2019-11-06 18:15:19 UTC
Verified with release 4.2
pruan@fedora-vm ~/workspace/gocode/src/github.com/operator-framework/operator-metering (fix_mirroring_registry_4.2\u25cf)$ oc get reportdatasource pod-usage-cpu-cores -o yaml | grep container                                                                 [ruby-2.6.3]
      sum(rate(container_cpu_usage_seconds_total{container!="POD",container!="",pod!=""}[1m])) BY (pod, namespace) + on (pod, namespace) group_left(node) (sum(kube_pod_info{pod_ip!="",node!="",host_ip!=""}) by (pod, namespace, node) * 0)
pruan@fedora-vm ~/workspace/gocode/src/github.com/operator-framework/operator-metering (fix_mirroring_registry_4.2\u25cf)$ oc get reportdatasource pod-usage-memory-bytes -o yaml | grep container                                                              [ruby-2.6.3]
      sum(container_memory_usage_bytes{container!="POD", container!="",pod!=""}) by (pod, namespace) + on (pod, namespace) group_left(node) (sum(kube_pod_info{pod_ip!="",node!="",host_ip!=""}) by (pod, namespace, node) * 0)

Comment 6 errata-xmlrpc 2019-11-19 13:49:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3869


Note You need to log in before you can comment on or make changes to this bug.