The container scopes have pod_name/namespace, but the pod slice doesn't, which I think means that we can't see build CPU metrics. Noticed this on https://prometheus-openshift-monitoring.svc.ci.openshift.org/graph?g0.range_input=6h&g0.expr=rate(haproxy_server_bytes_in_total%5B5m%5D)&g0.tab=0&g1.range_input=1h&g1.expr=sort_desc(container_cpu_usage_rate%7Bnode_role_kubernetes_io_master%3D%22true%22%7D)&g1.tab=0 when I was looking at the master static pod metrics. Appears to have been broken since at least 3.9, maybe earlier. Without this we cannot query for CPU or memory by pod
Pod usage stats collected from the pod level cgroup is reported in the kubelet stats summary API since 1.9. https://github.com/kubernetes/kubernetes/pull/55969
After further discussion, it is clear the issue is with the /metrics/cadvisor endpoint missing the required labels for pod name and namespace when called from a Kubernetes context. It's possible we can wrap the label decorator func so it has access to do efficient sub-container lookup so it can decorate pod_name and pod_namespace for pod cgroups.
Kube PR: https://github.com/kubernetes/kubernetes/pull/63406
Deferred to 3.11. PR merged upstream and will come in on the kube 1.11 rebase. Wait until 3.11 rebases on 1.11 before moving to ON_QA.
Verify on openshift v3.11.0-0.25.0 we can get the master static metrics in prometheus console query script: sum without (cpu) (rate(container_cpu_usage_seconds_total{node_role_kubernetes_io_master="true"}[5m])) I can get data in prometheus console like: {beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="431ac1fb-1463-4527-b3d1-79245dd698e1",beta_kubernetes_io_os="linux",container_name="c",failure_domain_beta_kubernetes_io_region="regionOne",failure_domain_beta_kubernetes_io_zone="nova",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podfcaf46e6_afd8_11e8_a6a2_fa163ed26999.slice/docker-9f5182efa3260a7741a3592719dc0742c2a8df05535ecf786b50c4874f567056.scope",image="registry.dev.redhat.io/openshift3/ose-template-service-broker@sha256:f3f805a08103267155f3459885adc884985d77f3c56d11620433d50d16baa24c",instance="qe-juzhao-311-qeos-1-master-etcd-1",job="kubernetes-cadvisor",kubernetes_io_hostname="qe-juzhao-311-qeos-1-master-etcd-1",name="k8s_c_apiserver-rkqrw_openshift-template-service-broker_fcaf46e6-afd8-11e8-a6a2-fa163ed26999_0",namespace="openshift-template-service-broker",node_role_kubernetes_io_master="true",pod_name="apiserver-rkqrw"} 0.001801434737499985
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652