+++ This bug was initially created as a clone of Bug #1756548 +++ Description of problem: The metering reporting-operator is using container_name instead of container for metrics labels in Prometheus queries on pod memory/cpu usage. These are deprecated in 1.14 and removed in 1.16 which will be Openshift 4.3. Because metering may be running against a 4.3 cluster prior to being upgraded, it's possible we'll be using the deprecated metrics labels in 4.3 until our operator is upgraded. This would mean metering is broken on the 4.2 to 4.3 upgrade unless we backport this fix to 4.2 ensuring we use the kube 1.16 metrics labels. Version-Release number of selected component (if applicable): 4.2.x How reproducible: Always Steps to Reproduce: 1. Querying Prometheus directly shows the same behavior. sum(rate(container_cpu_usage_seconds_total{container_name!="POD",container_name!="",pod!=""}[1m])) BY (pod, namespace) + on (pod, namespace) group_left(node) (sum(kube_pod_info{pod_ip!="",node!="",host_ip!=""}) by (pod, namespace, node) * 0) and sum(container_memory_usage_bytes{container_name!="POD", container_name!="",pod!=""}) by (pod, namespace) + on (pod, namespace) group_left(node) (sum(kube_pod_info{pod_ip!="",node!="",host_ip!=""}) by (pod, namespace, node) * 0) both return no metrics in 4.3, but work in 4.2 and 4.1 After investigation it's because the container_name metric label changed to container in Kube 1.14, and in 1.16 the old metric labels such as container_name and pod_name were removed. We need to update our metrics queries to use container instead of container_name.
Once https://github.com/operator-framework/operator-metering/pull/960 is cherry-picked into release-4.2 and built, the verification steps will be different from 4.3. In 4.2 this isn't breaking anything, so we need to just verify the ReportDataSources pod-usage-cpu-cores and pod-usage-memory-bytes no longer reference container_name in their Prometheus query, and instead use "container". Also, metrics should be importing for both these dataSources.
Verified with release 4.2 pruan@fedora-vm ~/workspace/gocode/src/github.com/operator-framework/operator-metering (fix_mirroring_registry_4.2\u25cf)$ oc get reportdatasource pod-usage-cpu-cores -o yaml | grep container [ruby-2.6.3] sum(rate(container_cpu_usage_seconds_total{container!="POD",container!="",pod!=""}[1m])) BY (pod, namespace) + on (pod, namespace) group_left(node) (sum(kube_pod_info{pod_ip!="",node!="",host_ip!=""}) by (pod, namespace, node) * 0) pruan@fedora-vm ~/workspace/gocode/src/github.com/operator-framework/operator-metering (fix_mirroring_registry_4.2\u25cf)$ oc get reportdatasource pod-usage-memory-bytes -o yaml | grep container [ruby-2.6.3] sum(container_memory_usage_bytes{container!="POD", container!="",pod!=""}) by (pod, namespace) + on (pod, namespace) group_left(node) (sum(kube_pod_info{pod_ip!="",node!="",host_ip!=""}) by (pod, namespace, node) * 0)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3869