Bug 1939547

Summary:	Include container="POD" in resource queries
Product:	OpenShift Container Platform	Reporter:	Pawel Krupa <pkrupa>
Component:	Monitoring	Assignee:	Damien Grisonnet <dgrisonn>
Status:	CLOSED ERRATA	QA Contact:	hongyan li <hongyli>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.7	CC:	alegrand, anowak, anpicker, dgrisonn, erooth, kakkoyun, lcosic, mdhanve, pkrupa, rsandu, skanniha, skrenger, spasquie
Target Milestone:	---	Keywords:	Reopened
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:	If this bug requires documentation, please select an appropriate Doc Type value.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-27 22:53:48 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pawel Krupa 2021-03-16 15:29:59 UTC

Description of problem:

Pod doesn't only consists of application containers, but there is also a pause container which is marked as `container="POD"`. This container resources are also counted towards overall pod resource consumption hence it should be counted.

Version-Release number of selected component (if applicable):


How reproducible: always


Steps to Reproduce:
1. compare outputs of webUI and `oc adm top pods`

Actual results: Output values are not the same


Expected results: Output values are the same


Additional info:

We need to remove `container!="POD"` from queries in alerts, recording rules, and dashboards.

more in https://coreos.slack.com/archives/C0VMT03S5/p1615550752086600

Comment 1 Damien Grisonnet 2021-03-16 15:37:42 UTC

Increasing severity/priority to medium as this bug also affects autoscaling.

Comment 2 Damien Grisonnet 2021-04-26 12:58:38 UTC

Since the PR has been merged upstream, the fix will land in 4.8 with the bump of kube-prometheus downstream.

Closing as UPSTREAM.

Comment 4 hongyan li 2021-05-06 06:08:49 UTC

Test with payload 4.8.0-0.nightly-2021-05-06-003426

#oc get cm prometheus-adapter-prometheus-config -oyaml
...
      "cpu":
        "containerLabel": "container"
        "containerQuery": "sum(irate(container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!=\"\",pod!=\"\"}[5m])) by (<<.GroupBy>>)"
        "nodeQuery": "sum(1 - irate(node_cpu_seconds_total{mode=\"idle\"}[5m]) * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:{<<.LabelMatchers>>}) by (<<.GroupBy>>) or sum (1- irate(windows_cpu_time_total{mode=\"idle\", job=\"windows-exporter\",<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)"
...
      "memory":
        "containerLabel": "container"
        "containerQuery": "sum(container_memory_working_set_bytes{<<.LabelMatchers>>,container!=\"\",pod!=\"\"}) by (<<.GroupBy>>)"
        "nodeQuery": "sum(node_memory_MemTotal_bytes{job=\"node-exporter\",<<.LabelMatchers>>} - node_memory_MemAvailable_bytes{job=\"node-exporter\",<<.LabelMatchers>>}) by (<<.GroupBy>>) or sum(windows_cs_physical_memory_bytes{job=\"windows-exporter\",<<.LabelMatchers>>} - windows_memory_available_bytes{job=\"windows-exporter\",<<.LabelMatchers>>}) by (<<.GroupBy>>)"
...

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=sum(container_memory_working_set_bytes{pod="prometheus-operator-7695b86877-bd4tk",namespace="openshift-monitoring"}) BY (container)/1024/1024'|jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   524    0   343  100   181  19055  10055 --:--:-- --:--:-- --:--:-- 29111
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "container": "POD"
        },
        "value": [
          1620281212.51,
          "0.171875"
        ]
      },
      {
        "metric": {
          "container": "kube-rbac-proxy"
        },
        "value": [
          1620281212.51,
          "19.5625"
        ]
      },
      {
        "metric": {
          "container": "prometheus-operator"
        },
        "value": [
          1620281212.51,
          "126.88671875"
        ]
      },
      {
        "metric": {},
        "value": [
          1620281212.51,
          "148.9921875"
        ]
      }
    ]
  }
}
# oc adm top pod prometheus-operator-7695b86877-bd4tk --containers
POD                                    NAME                  CPU(cores)   MEMORY(bytes)   
prometheus-operator-7695b86877-bd4tk   POD                   0m           0Mi             
prometheus-operator-7695b86877-bd4tk   kube-rbac-proxy       0m           19Mi            
prometheus-operator-7695b86877-bd4tk   prometheus-operator   1m           126Mi           
# oc adm top pod prometheus-operator-7695b86877-bd4tk
NAME                                   CPU(cores)   MEMORY(bytes)   
prometheus-operator-7695b86877-bd4tk   1m           146Mi

Comment 5 hongyan li 2021-05-06 06:13:47 UTC

# oc get PodMetrics prometheus-operator-7695b86877-bd4tk -oyaml
apiVersion: metrics.k8s.io/v1beta1
containers:
- name: kube-rbac-proxy
  usage:
    cpu: "0"
    memory: 20036Ki
- name: prometheus-operator
  usage:
    cpu: 2m
    memory: 133432Ki
- name: POD
  usage:
    cpu: "0"
    memory: 176Ki
kind: PodMetrics
metadata:
  creationTimestamp: "2021-05-06T06:11:49Z"
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: prometheus-operator
    app.kubernetes.io/part-of: openshift-monitoring
    app.kubernetes.io/version: 0.47.0
    pod-template-hash: 7695b86877
  name: prometheus-operator-7695b86877-bd4tk
  namespace: openshift-monitoring
timestamp: "2021-05-06T06:11:49Z"
window: 5m0s

Comment 7 Damien Grisonnet 2021-06-07 10:40:48 UTC

I don't think a backport is meaningful here since the bug has a fairly low impact on the product.

To clarify, not accounting for the pause container resource usages doesn't have any impact on the autoscaling pipeline, so the only benefit of this fix would be to have `oc adm top pods` being more accurate. That said, the pause container resource usages are so low compared to actual applications resource usages that they are negligible. But maybe your customer as a use case that makes it non-negligeable?

Comment 9 Damien Grisonnet 2021-06-09 09:45:48 UTC

Yes the HPA is also affected by this change, but the impact that the resource usages of the pause container have on autoscaling is negligible, hence why I don't think this bug is worth backporting.

Comment 15 errata-xmlrpc 2021-07-27 22:53:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 16 Philip Gough 2022-01-21 09:12:27 UTC

*** Bug 2036003 has been marked as a duplicate of this bug. ***