Bug 1573717 - No prometheus labels set for pod slices, which means we can't query by label selector for CPU attributed to a pod
Summary: No prometheus labels set for pod slices, which means we can't query by label ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.0
Assignee: Derek Carr
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-02 06:28 UTC by Clayton Coleman
Modified: 2018-10-11 07:19 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-10-11 07:19:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:19:39 UTC

Description Clayton Coleman 2018-05-02 06:28:28 UTC
The container scopes have pod_name/namespace, but the pod slice doesn't, which I think means that we can't see build CPU metrics. Noticed this on https://prometheus-openshift-monitoring.svc.ci.openshift.org/graph?g0.range_input=6h&g0.expr=rate(haproxy_server_bytes_in_total%5B5m%5D)&g0.tab=0&g1.range_input=1h&g1.expr=sort_desc(container_cpu_usage_rate%7Bnode_role_kubernetes_io_master%3D%22true%22%7D)&g1.tab=0 when I was looking at the master static pod metrics.

Appears to have been broken since at least 3.9, maybe earlier.

Without this we cannot query for CPU or memory by pod

Comment 1 Derek Carr 2018-05-02 17:20:40 UTC
Pod usage stats collected from the pod level cgroup is reported in the kubelet stats summary API since 1.9.

https://github.com/kubernetes/kubernetes/pull/55969

Comment 2 Derek Carr 2018-05-03 03:24:30 UTC
After further discussion, it is clear the issue is with the /metrics/cadvisor endpoint missing the required labels for pod name and namespace when called from a Kubernetes context.  It's possible we can wrap the label decorator func so it has access to do efficient sub-container lookup so it can decorate pod_name and pod_namespace for pod cgroups.

Comment 3 Seth Jennings 2018-05-07 00:39:06 UTC
Kube PR:
https://github.com/kubernetes/kubernetes/pull/63406

Comment 4 Seth Jennings 2018-05-11 05:26:30 UTC
Deferred to 3.11.  PR merged upstream and will come in on the kube 1.11 rebase.
Wait until 3.11 rebases on 1.11 before moving to ON_QA.

Comment 6 DeShuai Ma 2018-09-04 07:21:23 UTC
Verify on openshift v3.11.0-0.25.0
we can get the master static metrics in prometheus console

query script: 
sum without (cpu) (rate(container_cpu_usage_seconds_total{node_role_kubernetes_io_master="true"}[5m]))

I can get data in prometheus console like:
{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="431ac1fb-1463-4527-b3d1-79245dd698e1",beta_kubernetes_io_os="linux",container_name="c",failure_domain_beta_kubernetes_io_region="regionOne",failure_domain_beta_kubernetes_io_zone="nova",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podfcaf46e6_afd8_11e8_a6a2_fa163ed26999.slice/docker-9f5182efa3260a7741a3592719dc0742c2a8df05535ecf786b50c4874f567056.scope",image="registry.dev.redhat.io/openshift3/ose-template-service-broker@sha256:f3f805a08103267155f3459885adc884985d77f3c56d11620433d50d16baa24c",instance="qe-juzhao-311-qeos-1-master-etcd-1",job="kubernetes-cadvisor",kubernetes_io_hostname="qe-juzhao-311-qeos-1-master-etcd-1",name="k8s_c_apiserver-rkqrw_openshift-template-service-broker_fcaf46e6-afd8-11e8-a6a2-fa163ed26999_0",namespace="openshift-template-service-broker",node_role_kubernetes_io_master="true",pod_name="apiserver-rkqrw"}	0.001801434737499985

Comment 9 errata-xmlrpc 2018-10-11 07:19:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.