Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1573717

Summary:	No prometheus labels set for pod slices, which means we can't query by label selector for CPU attributed to a pod
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Node	Assignee:	Derek Carr <decarr>
Status:	CLOSED ERRATA	QA Contact:	DeShuai Ma <dma>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.10.0	CC:	aos-bugs, decarr, jokerman, kalexand, mmccomas, sjenning
Target Milestone:	---	Keywords:	TestCaseNeeded
Target Release:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:	undefined	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-10-11 07:19:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2018-05-02 06:28:28 UTC

The container scopes have pod_name/namespace, but the pod slice doesn't, which I think means that we can't see build CPU metrics. Noticed this on https://prometheus-openshift-monitoring.svc.ci.openshift.org/graph?g0.range_input=6h&g0.expr=rate(haproxy_server_bytes_in_total%5B5m%5D)&g0.tab=0&g1.range_input=1h&g1.expr=sort_desc(container_cpu_usage_rate%7Bnode_role_kubernetes_io_master%3D%22true%22%7D)&g1.tab=0 when I was looking at the master static pod metrics.

Appears to have been broken since at least 3.9, maybe earlier.

Without this we cannot query for CPU or memory by pod

Comment 1 Derek Carr 2018-05-02 17:20:40 UTC

Pod usage stats collected from the pod level cgroup is reported in the kubelet stats summary API since 1.9.

https://github.com/kubernetes/kubernetes/pull/55969

Comment 2 Derek Carr 2018-05-03 03:24:30 UTC

After further discussion, it is clear the issue is with the /metrics/cadvisor endpoint missing the required labels for pod name and namespace when called from a Kubernetes context.  It's possible we can wrap the label decorator func so it has access to do efficient sub-container lookup so it can decorate pod_name and pod_namespace for pod cgroups.

Comment 3 Seth Jennings 2018-05-07 00:39:06 UTC

Kube PR:
https://github.com/kubernetes/kubernetes/pull/63406

Comment 4 Seth Jennings 2018-05-11 05:26:30 UTC

Deferred to 3.11.  PR merged upstream and will come in on the kube 1.11 rebase.
Wait until 3.11 rebases on 1.11 before moving to ON_QA.

Comment 6 DeShuai Ma 2018-09-04 07:21:23 UTC

Verify on openshift v3.11.0-0.25.0
we can get the master static metrics in prometheus console

query script: 
sum without (cpu) (rate(container_cpu_usage_seconds_total{node_role_kubernetes_io_master="true"}[5m]))

I can get data in prometheus console like:
{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="431ac1fb-1463-4527-b3d1-79245dd698e1",beta_kubernetes_io_os="linux",container_name="c",failure_domain_beta_kubernetes_io_region="regionOne",failure_domain_beta_kubernetes_io_zone="nova",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podfcaf46e6_afd8_11e8_a6a2_fa163ed26999.slice/docker-9f5182efa3260a7741a3592719dc0742c2a8df05535ecf786b50c4874f567056.scope",image="registry.dev.redhat.io/openshift3/ose-template-service-broker@sha256:f3f805a08103267155f3459885adc884985d77f3c56d11620433d50d16baa24c",instance="qe-juzhao-311-qeos-1-master-etcd-1",job="kubernetes-cadvisor",kubernetes_io_hostname="qe-juzhao-311-qeos-1-master-etcd-1",name="k8s_c_apiserver-rkqrw_openshift-template-service-broker_fcaf46e6-afd8-11e8-a6a2-fa163ed26999_0",namespace="openshift-template-service-broker",node_role_kubernetes_io_master="true",pod_name="apiserver-rkqrw"}	0.001801434737499985

Comment 9 errata-xmlrpc 2018-10-11 07:19:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652