Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1573717

Summary: No prometheus labels set for pod slices, which means we can't query by label selector for CPU attributed to a pod
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: NodeAssignee: Derek Carr <decarr>
Status: CLOSED ERRATA QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.10.0CC: aos-bugs, decarr, jokerman, kalexand, mmccomas, sjenning
Target Milestone: ---Keywords: TestCaseNeeded
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:19:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2018-05-02 06:28:28 UTC
The container scopes have pod_name/namespace, but the pod slice doesn't, which I think means that we can't see build CPU metrics. Noticed this on https://prometheus-openshift-monitoring.svc.ci.openshift.org/graph?g0.range_input=6h&g0.expr=rate(haproxy_server_bytes_in_total%5B5m%5D)&g0.tab=0&g1.range_input=1h&g1.expr=sort_desc(container_cpu_usage_rate%7Bnode_role_kubernetes_io_master%3D%22true%22%7D)&g1.tab=0 when I was looking at the master static pod metrics.

Appears to have been broken since at least 3.9, maybe earlier.

Without this we cannot query for CPU or memory by pod

Comment 1 Derek Carr 2018-05-02 17:20:40 UTC
Pod usage stats collected from the pod level cgroup is reported in the kubelet stats summary API since 1.9.

https://github.com/kubernetes/kubernetes/pull/55969

Comment 2 Derek Carr 2018-05-03 03:24:30 UTC
After further discussion, it is clear the issue is with the /metrics/cadvisor endpoint missing the required labels for pod name and namespace when called from a Kubernetes context.  It's possible we can wrap the label decorator func so it has access to do efficient sub-container lookup so it can decorate pod_name and pod_namespace for pod cgroups.

Comment 3 Seth Jennings 2018-05-07 00:39:06 UTC
Kube PR:
https://github.com/kubernetes/kubernetes/pull/63406

Comment 4 Seth Jennings 2018-05-11 05:26:30 UTC
Deferred to 3.11.  PR merged upstream and will come in on the kube 1.11 rebase.
Wait until 3.11 rebases on 1.11 before moving to ON_QA.

Comment 6 DeShuai Ma 2018-09-04 07:21:23 UTC
Verify on openshift v3.11.0-0.25.0
we can get the master static metrics in prometheus console

query script: 
sum without (cpu) (rate(container_cpu_usage_seconds_total{node_role_kubernetes_io_master="true"}[5m]))

I can get data in prometheus console like:
{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="431ac1fb-1463-4527-b3d1-79245dd698e1",beta_kubernetes_io_os="linux",container_name="c",failure_domain_beta_kubernetes_io_region="regionOne",failure_domain_beta_kubernetes_io_zone="nova",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podfcaf46e6_afd8_11e8_a6a2_fa163ed26999.slice/docker-9f5182efa3260a7741a3592719dc0742c2a8df05535ecf786b50c4874f567056.scope",image="registry.dev.redhat.io/openshift3/ose-template-service-broker@sha256:f3f805a08103267155f3459885adc884985d77f3c56d11620433d50d16baa24c",instance="qe-juzhao-311-qeos-1-master-etcd-1",job="kubernetes-cadvisor",kubernetes_io_hostname="qe-juzhao-311-qeos-1-master-etcd-1",name="k8s_c_apiserver-rkqrw_openshift-template-service-broker_fcaf46e6-afd8-11e8-a6a2-fa163ed26999_0",namespace="openshift-template-service-broker",node_role_kubernetes_io_master="true",pod_name="apiserver-rkqrw"}	0.001801434737499985

Comment 9 errata-xmlrpc 2018-10-11 07:19:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652