On the masters, excluding node-exporter, the monitoring stack should request ~50m CPU at most (we have a lot of other operators). If some components can drop to 5m that would be good. For node-exporter, it looks like it is using about 100m during the run, so the number is not completely out of line with the request, but in general uses less than the SDN container. We may want to consider tuning it down to 75m on all nodes. +++ This bug was initially created as a clone of Bug #1812583 +++ Our default install is 3x4core masters, 3x2core workers. In an e2e run which is a representative workload for customers of medium to large scale clusters, we use 6.6 cores on average across masters (out of 12) 4.06 cores on average across workers (out of 6) However, our default requests and limits for the pods components on this cluster are: 11.7 cores for the masters (out of 12) 8.04 cores for the workers (out of 6) As a result, our default cluster cannot correctly install (runs out of requested CPU) but runs fine, which is a 4.4 release blocker. --- CPU is compressible, and in general we don't set requests based on how much CPU you use, we establish ratios between components on the same roles. For instance, etcd and kube-apiserver should receive the most CPU on the masters because if they get starved, all other components suffer. The other master components should request CPU based on a formula like: etcd request * (component usage / etcd usage) The worker components that run on all nodes should follow a similar rule, based around either kubelet or openshift-sdn. However, large components on the nodes may need a tweak factor: sdn request * (component usage / sdn usage) --- Request by namespace and role: {namespace="openshift-monitoring",role="worker"} 4.009765625 {namespace="openshift-etcd",role="master"} 2.5810546875 {namespace="openshift-sdn",role="master"} 1.869140625 {namespace="openshift-sdn",role="worker"} 1.8046875 {namespace="openshift-kube-controller-manager",role="master"} 1.3212890625 {namespace="openshift-kube-apiserver",role="master"} 1.0810546875 {namespace="openshift-apiserver",role="master"} 0.90234375 {namespace="openshift-monitoring",role="master"} 0.736328125 {namespace="openshift-dns",role="worker"} 0.662109375 {namespace="openshift-dns",role="master"} 0.662109375 {namespace="openshift-controller-manager",role="master"} 0.603515625 {namespace="openshift-machine-config-operator",role="master"} 0.509765625 {namespace="openshift-image-registry",role="worker"} 0.466796875 {namespace="openshift-ingress",role="worker"} 0.40234375 {namespace="openshift-machine-config-operator",role="worker"} 0.240234375 {namespace="openshift-kube-storage-version-migrator",role="worker"} 0.201171875 {namespace="openshift-machine-api",role="master"} 0.181640625 {namespace="openshift-multus",role="master"} 0.12890625 {namespace="kube-system",role="master"} 0.123046875 {namespace="openshift-image-registry",role="master"} 0.10546875 {namespace="openshift-marketplace",role="worker"} 0.0859375 {namespace="openshift-console",role="master"} 0.0859375 {namespace="openshift-operator-lifecycle-manager",role="master"} 0.0859375 {namespace="openshift-cluster-node-tuning-operator",role="master"} 0.0859375 {namespace="openshift-kube-scheduler",role="master"} 0.0703125 {namespace="openshift-cluster-node-tuning-operator",role="worker"} 0.064453125 {namespace="openshift-multus",role="worker"} 0.064453125 {namespace="openshift-authentication",role="master"} 0.04296875 {namespace="openshift-dns-operator",role="master"} 0.041015625 {namespace="openshift-cluster-machine-approver",role="master"} 0.041015625 {namespace="openshift-cluster-version",role="master"} 0.041015625 {namespace="openshift-ingress-operator",role="master"} 0.041015625 {namespace="openshift-cluster-samples-operator",role="master"} 0.041015625 {namespace="openshift-network-operator",role="master"} 0.021484375 {namespace="openshift-kube-storage-version-migrator-operator",role="master"} 0.021484375 {namespace="openshift-service-ca-operator",role="master"} 0.021484375 {namespace="openshift-insights",role="master"} 0.021484375 {namespace="openshift-csi-snapshot-controller-operator",role="worker"} 0.021484375 {namespace="openshift-kube-controller-manager-operator",role="master"} 0.021484375 {namespace="openshift-authentication-operator",role="master"} 0.021484375 {namespace="openshift-cloud-credential-operator",role="master"} 0.021484375 {namespace="openshift-etcd-operator",role="master"} 0.021484375 {namespace="openshift-console-operator",role="master"} 0.021484375 {namespace="openshift-kube-apiserver-operator",role="master"} 0.021484375 {namespace="openshift-cluster-storage-operator",role="master"} 0.021484375 {namespace="openshift-controller-manager-operator",role="master"} 0.021484375 {namespace="openshift-service-catalog-controller-manager-operator",role="master"} 0.021484375 {namespace="openshift-service-ca",role="master"} 0.021484375 {namespace="openshift-marketplace",role="master"} 0.021484375 {namespace="openshift-csi-snapshot-controller",role="worker"} 0.021484375 {namespace="openshift-apiserver-operator",role="master"} 0.021484375 {namespace="openshift-kube-scheduler-operator",role="master"} 0.005859375 {namespace="openshift-service-catalog-apiserver-operator",role="master"} 0.005859375 Usage by namespace and role {namespace="openshift-kube-apiserver",role="master"} 2.1619876333846624 {namespace="openshift-etcd",role="master"} 1.6578208562491923 {namespace="openshift-sdn",role="worker"} 0.7902127474534021 {namespace="openshift-apiserver",role="master"} 0.6654059737563104 {namespace="openshift-sdn",role="master"} 0.45042689393591295 {namespace="openshift-monitoring",role="worker"} 0.3504143483072635 {namespace="openshift-kube-controller-manager",role="master"} 0.20461836474549036 {namespace="openshift-etcd-operator",role="master"} 0.13743432789929938 {namespace="openshift-operator-lifecycle-manager",role="master"} 0.0884434181260195 {namespace="openshift-must-gather-kkjhl",role="master"} 0.08221540997755557 {namespace="openshift-monitoring",role="master"} 0.06295023150044035 {namespace="openshift-machine-config-operator",role="master"} 0.03892652122963432 {namespace="openshift-controller-manager",role="master"} 0.03838177104774991 {namespace="openshift-kube-apiserver-operator",role="master"} 0.03564176578869584 {namespace="openshift-ingress",role="worker"} 0.034161760609146746 {namespace="openshift-kube-scheduler",role="master"} 0.028997043612977058 {namespace="openshift-multus",role="master"} 0.028357733575176187 {namespace="openshift-cloud-credential-operator",role="master"} 0.02382491640890165 {namespace="openshift-kube-scheduler-operator",role="master"} 0.019677113395487122 {namespace="openshift-service-ca",role="master"} 0.019193497863481977 {namespace="openshift-kube-controller-manager-operator",role="master"} 0.017732005265559132 {namespace="openshift-marketplace",role="master"} 0.01595692508887505 {namespace="openshift-dns",role="worker"} 0.014646411228389108 {namespace="openshift-apiserver-operator",role="master"} 0.014566308284011975 {namespace="openshift-dns",role="master"} 0.013228066245094485 {namespace="openshift-image-registry",role="worker"} 0.013138524957335649 {namespace="openshift-marketplace",role="worker"} 0.012014092321005393 {namespace="openshift-console",role="master"} 0.007421924226788351 {namespace="openshift-image-registry",role="master"} 0.0071860119119124865 {namespace="openshift-authentication-operator",role="master"} 0.007069369592108443 {namespace="openshift-cluster-version",role="master"} 0.006795354406059571 {namespace="openshift-machine-config-operator",role="worker"} 0.006361723323325576 {namespace="openshift-authentication",role="master"} 0.006188662943334761 {namespace="openshift-machine-api",role="master"} 0.0052930512518087145 {namespace="openshift-network-operator",role="master"} 0.005136690421827466 {namespace="openshift-console-operator",role="master"} 0.004844998002650943 {namespace="openshift-controller-manager-operator",role="master"} 0.0045221224014901865 {namespace="openshift-multus",role="worker"} 0.0042029771037126965 {namespace="openshift-cluster-storage-operator",role="master"} 0.0038974091762590986 {namespace="openshift-service-ca-operator",role="master"} 0.0032243799219438558 {namespace="openshift-service-catalog-apiserver-operator",role="master"} 0.003034163254101611 {namespace="openshift-csi-snapshot-controller-operator",role="worker"} 0.002924888169013756 {namespace="openshift-service-catalog-controller-manager-operator",role="master"} 0.002260159790181888 {namespace="openshift-insights",role="master"} 0.002075358385262947 {namespace="openshift-cluster-samples-operator",role="master"} 0.002067038536157853 {namespace="openshift-cluster-node-tuning-operator",role="master"} 0.0019490981770514527 {namespace="openshift-kube-storage-version-migrator-operator",role="master"} 0.0018690529355203974 {namespace="kube-system",role="master"} 0.0013121596008085876 {namespace="openshift-ingress-operator",role="master"} 0.0012782041893306024 {namespace="openshift-csi-snapshot-controller",role="worker"} 0.0012313933297429486 {namespace="openshift-cluster-node-tuning-operator",role="worker"} 0.0009219469658359441 {namespace="openshift-cluster-machine-approver",role="master"} 0.0006950561574913539 {namespace="openshift-dns-operator",role="master"} 0.0006673132853684085 {namespace="openshift-kube-storage-version-migrator",role="worker"} 0.00018306567985376928 Request - usage {namespace="openshift-monitoring",role="worker"} 3.6593512766927363 {namespace="openshift-sdn",role="master"} 1.418713731064087 {namespace="openshift-kube-controller-manager",role="master"} 1.1166706977545096 {namespace="openshift-sdn",role="worker"} 1.0144747525465978 {namespace="openshift-etcd",role="master"} 0.9232338312508075 {namespace="openshift-monitoring",role="master"} 0.6733778934995597 {namespace="openshift-dns",role="master"} 0.6488813087549055 {namespace="openshift-dns",role="worker"} 0.6474629637716109 {namespace="openshift-controller-manager",role="master"} 0.5651338539522501 {namespace="openshift-machine-config-operator",role="master"} 0.4708391037703657 {namespace="openshift-image-registry",role="worker"} 0.45365835004266436 {namespace="openshift-ingress",role="worker"} 0.3681819893908532 {namespace="openshift-apiserver",role="master"} 0.2369377762436895 {namespace="openshift-machine-config-operator",role="worker"} 0.23387265167667443 {namespace="openshift-kube-storage-version-migrator",role="worker"} 0.20098880932014623 {namespace="openshift-machine-api",role="master"} 0.1763475737481913 {namespace="kube-system",role="master"} 0.12173471539919141 {namespace="openshift-multus",role="master"} 0.10054851642482382 {namespace="openshift-image-registry",role="master"} 0.09828273808808752 {namespace="openshift-cluster-node-tuning-operator",role="master"} 0.08398840182294855 {namespace="openshift-console",role="master"} 0.07851557577321165 {namespace="openshift-marketplace",role="worker"} 0.07392340767899461 {namespace="openshift-cluster-node-tuning-operator",role="worker"} 0.06353117803416405 {namespace="openshift-multus",role="worker"} 0.060250147896287305 {namespace="openshift-kube-scheduler",role="master"} 0.04131545638702294 {namespace="openshift-dns-operator",role="master"} 0.040348311714631595 {namespace="openshift-cluster-machine-approver",role="master"} 0.04032056884250865 {namespace="openshift-ingress-operator",role="master"} 0.039737420810669395 {namespace="openshift-cluster-samples-operator",role="master"} 0.038948586463842146 {namespace="openshift-authentication",role="master"} 0.03678008705666524 {namespace="openshift-cluster-version",role="master"} 0.03422027059394043 {namespace="openshift-csi-snapshot-controller",role="worker"} 0.02025298167025705 {namespace="openshift-kube-storage-version-migrator-operator",role="master"} 0.019615322064479603 {namespace="openshift-insights",role="master"} 0.019409016614737054 {namespace="openshift-service-catalog-controller-manager-operator",role="master"} 0.01922421520981811 {namespace="openshift-csi-snapshot-controller-operator",role="worker"} 0.018559486830986245 {namespace="openshift-service-ca-operator",role="master"} 0.018259995078056146 {namespace="openshift-cluster-storage-operator",role="master"} 0.0175869658237409 {namespace="openshift-controller-manager-operator",role="master"} 0.016962252598509815 {namespace="openshift-console-operator",role="master"} 0.016639376997349055 {namespace="openshift-network-operator",role="master"} 0.016347684578172532 {namespace="openshift-authentication-operator",role="master"} 0.014415005407891557 {namespace="openshift-apiserver-operator",role="master"} 0.006918066715988025 {namespace="openshift-marketplace",role="master"} 0.00552744991112495 {namespace="openshift-kube-controller-manager-operator",role="master"} 0.0037523697344408677 {namespace="openshift-service-catalog-apiserver-operator",role="master"} 0.002825211745898389 {namespace="openshift-service-ca",role="master"} 0.0022908771365180228 {namespace="openshift-cloud-credential-operator",role="master"} -0.00234054140890165 {namespace="openshift-operator-lifecycle-manager",role="master"} -0.0025059181260194963 {namespace="openshift-kube-scheduler-operator",role="master"} -0.013817738395487122 {namespace="openshift-kube-apiserver-operator",role="master"} -0.014157390788695837 {namespace="openshift-etcd-operator",role="master"} -0.11594995289929938 {namespace="openshift-kube-apiserver",role="master"} -1.0809329458846624 We will update this shortly with the ratios that everyone should use. --- Additional comment from Clayton Coleman on 2020-03-11 12:29:49 EDT --- To gather these, grab an e2e run prometheus, find the time that the e2e tests mostly stopped, set that as the current time in your promecius or local prometheus instance in the graph selector, and run the queries: (note the role stuff only works on GCP and Azure because AWS naming sucks) Requests per namespace: sort_desc(sum by (namespace,role) ((max without (id,endpoint,image,job,metrics_path,instance,name,service) (label_replace(label_replace(container_spec_cpu_shares{pod!="",namespace!~"e2e.*"}, "role", "master", "node", ".*-m-.*"), "role", "worker", "node", ".*-w-.*")) / 1024)) > 0) Usage per namespace (measured from the last 15m): sort_desc(sum by (namespace,role) ((max without (id,endpoint,image,job,metrics_path,instance,name,service) (label_replace(label_replace(rate(container_cpu_usage_seconds_total{pod!="",container="",namespace!~"e2e.*"}[15m]), "role", "master", "node", ".*-m-.*"), "role", "worker", "node", ".*-w-.*")))) > 0) Difference between them: sort_desc(sort_desc(sum by (namespace,role) ((max without (id,endpoint,image,job,metrics_path,instance,name,service) (label_replace(label_replace(container_spec_cpu_shares{pod!="",namespace!~"e2e.*"}, "role", "master", "node", ".*-m-.*"), "role", "worker", "node", ".*-w-.*")) / 1024))) - sort_desc(sum by (namespace,role) ((max without (id,endpoint,image,job,metrics_path,instance,name,service) (label_replace(label_replace(rate(container_cpu_usage_seconds_total{pod!="",container="",namespace!~"e2e.*"}[15m]), "role", "master", "node", ".*-m-.*"), "role", "worker", "node", ".*-w-.*")))))) Actual usage: sum by (role) (label_replace(label_replace(rate(container_cpu_usage_seconds_total{id="/"}[15m]), "role", "master", "node", ".*-m-.*"), "role", "worker", "node", ".*-w-.*")) --- Additional comment from Clayton Coleman on 2020-03-11 13:53:29 EDT --- After some basic data analysis, it's reasonable to say that out of the 6.6 cores in use, the fraction used by the key components on masters is: 25% etcd 25% 33% kube-apiserver 10% openshift-apiserver 5% kcm That's 73% of total usage. I would expect the requests to be roughly proportional to these percentages out of our arbitrary floor. I think 3 cores requested is a reasonable base master spec for idle, in which case the requests would be: 330m etcd 250m kube-apiserver 10m openshift-apiserver 5m kcm And then the remaining components should have requests that consume no more than 270m, divied up fairly based on their average use. A good default is going to be 5m for an operator that doesn't serve traffic or answer lots of queries. This would end up with us on masters having roughly 1 core set aside for core workload, and we would only schedule flex workloads on masters down to that single core. In very large masters these numbers might have to flex upwards, but we can't solve that component by component. --- Additional comment from Clayton Coleman on 2020-03-11 14:40:20 EDT --- Working on draft recommendations. --- Additional comment from Clayton Coleman on 2020-03-11 18:31:34 EDT --- Here is the draft recommendation that child bugs should follow: Rules for component teams: 1. Determine your average CPU usage from the list above (breaking down any components that are split across namespace) 2. Pods that run on a master that are not on all nodes (i.e. exclude dns, openshift-sdn, mcd) should have a request that is proportional to your CPU usage relative to kube-apiserver. kube-apiserver is allocated 33% of all resources (2.16 cores out of 6.6 cores). Calculate your CPU usage relative to kube-apiserver, and then multiply 330m by your fraction of kube-apiserver use (i.e. kube-scheduler uses 0.028 cores, so 0.028 * 330m = 9.24m) Special cases: * Certain infra components will be assigned slightly higher requests (kcm, scheduler, ocm given known problems if they fall behind) * Leader elected components should set their request to their expected usage across all pods, even though they will be over provisioned (kcm should request Xm on all nodes, not Xm/3) * No component should be lower than 5m per pod, but it must be set to 5m 3. Pods that run on a worker should be proportional to actual CPU use on our minimum e2e run. openshift-sdn uses 790m on workers, so between ovs and sdn per node there should be 790m/3 ~ 250m of CPU allocation between ovs and sdn pods. 4. Large infra components like monitoring should set their request proportional to openshift-sdn namespace openshift-sdn is allocated 750m and uses 790m. Prometheus uses 350m and requests 4 on workers. Because prometheus has a significant vertical scaling component, it should probably be close to openshift-sdn in terms of requests, and if it needs more active resource growth the operator should manage that. Node-exporter should be set relative to openshift-sdn --- Additional comment from Clayton Coleman on 2020-03-11 18:41:27 EDT --- Commit message recommendation: Normalize CPU requests on masters The {x} uses approximately {percent}% of master CPU in a reasonable medium sized workload. Given a 1 core per master baseline (since CPU is compressible and shared), assign the kube-apiserver roughly {desired_percent}% of that core on each master. PRs will be opened tracking each component in the list above (core control plane) and then the worst offenders in the list will be updated. --- Additional comment from Clayton Coleman on 2020-03-11 18:56:24 EDT --- Allocating 10% for the kube-controller-manager (1/3 of kube-apiserver). --- Additional comment from Clayton Coleman on 2020-03-11 19:05:56 EDT --- openshift-apiserver is 10% (1/3 of kube-apiserver) etcd is 25% (~70-80% of kube-apiserver)
@Sergiusz, It seems we should increase cpu request for cluster-monitoring-operator/prometheus-k8s pods sum by(container) (rate(container_cpu_usage_seconds_total{pod=~"cluster-monitoring-operator.*", container!="",container!="POD"}[15m])*1024) Element Value {container="cluster-monitoring-operator"} 52.97751672012032 {container="kube-rbac-proxy"} 0.09442782417719313 sum by(container) (rate(container_cpu_usage_seconds_total{pod=~"prometheus-k8s-0", container!="",container!="POD"}[15m])*1024) Element Value {container="prometheus-proxy"} 0.7842356586338781 {container="rules-configmap-reloader"} 0.0019742485983387653 {container="thanos-sidecar"} 1.0615358213087878 {container="kube-rbac-proxy"} 0.028092780408990707 {container="prom-label-proxy"} 0.022147447991531597 {container="prometheus"} 82.99479507055307 {container="prometheus-config-reloader"} 0.02606141733379576 # for i in $(kubectl -n openshift-monitoring get po --no-headers | awk '{print $1}'); do echo $i; kubectl -n openshift-monitoring get pod $i -o go-template='{{range.spec.containers}}{{"Container Name: "}}{{.name}}{{"\r\nresources: "}}{{.resources}}{{"\n"}}{{end}}'; echo -e "\n"; done .... cluster-monitoring-operator-c4dbd665f-jfd7v Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: cluster-monitoring-operator resources: map[requests:map[cpu:10m memory:50Mi]] prometheus-k8s-0 Container Name: prometheus resources: map[requests:map[cpu:70m memory:1Gi]] Container Name: prometheus-config-reloader resources: map[] Container Name: rules-configmap-reloader resources: map[] Container Name: thanos-sidecar resources: map[requests:map[cpu:1m memory:100Mi]] Container Name: prometheus-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] prometheus-k8s-1 Container Name: prometheus resources: map[requests:map[cpu:70m memory:1Gi]] Container Name: prometheus-config-reloader resources: map[] Container Name: rules-configmap-reloader resources: map[] Container Name: thanos-sidecar resources: map[requests:map[cpu:1m memory:100Mi]] Container Name: prometheus-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] ...
Tested with 4.5.0-0.nightly-2020-03-16-004817, the cpu request for the monitoring stack is changed to a lower number now, and the stack is healthy # for i in $(kubectl -n openshift-monitoring get po --no-headers | awk '{print $1}'); do echo $i; kubectl -n openshift-monitoring get pod $i -o go-template='{{range.spec.containers}}{{"Container Name: "}}{{.name}}{{"\r\nresources: "}}{{.resources}}{{"\n"}}{{end}}'; echo -e "\n"; done alertmanager-main-0 Container Name: alertmanager resources: map[requests:map[cpu:4m memory:200Mi]] Container Name: config-reloader resources: map[] Container Name: alertmanager-proxy resources: map[requests:map[cpu:1m memory:20Mi]] alertmanager-main-1 Container Name: alertmanager resources: map[requests:map[cpu:4m memory:200Mi]] Container Name: config-reloader resources: map[] Container Name: alertmanager-proxy resources: map[requests:map[cpu:1m memory:20Mi]] alertmanager-main-2 Container Name: alertmanager resources: map[requests:map[cpu:4m memory:200Mi]] Container Name: config-reloader resources: map[] Container Name: alertmanager-proxy resources: map[requests:map[cpu:1m memory:20Mi]] cluster-monitoring-operator-c4dbd665f-jfd7v Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: cluster-monitoring-operator resources: map[requests:map[cpu:10m memory:50Mi]] grafana-69cff65d46-8pj8z Container Name: grafana resources: map[requests:map[cpu:4m memory:100Mi]] Container Name: grafana-proxy resources: map[requests:map[cpu:1m memory:20Mi]] kube-state-metrics-77d6884646-jfjjq Container Name: kube-state-metrics resources: map[requests:map[cpu:2m memory:40Mi]] Container Name: kube-rbac-proxy-main resources: map[requests:map[cpu:1m memory:40Mi]] Container Name: kube-rbac-proxy-self resources: map[requests:map[cpu:1m memory:40Mi]] node-exporter-859h2 Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-f8j74 Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-nwwjx Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-q65tf Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-qv5gp Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-tqf9q Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] openshift-state-metrics-f78c86754-2kgxw Container Name: kube-rbac-proxy-main resources: map[requests:map[cpu:10m memory:20Mi]] Container Name: kube-rbac-proxy-self resources: map[requests:map[cpu:10m memory:20Mi]] Container Name: openshift-state-metrics resources: map[requests:map[cpu:100m memory:150Mi]] prometheus-adapter-74cb89957d-9wln4 Container Name: prometheus-adapter resources: map[requests:map[cpu:1m memory:25Mi]] prometheus-adapter-74cb89957d-rz8dm Container Name: prometheus-adapter resources: map[requests:map[cpu:1m memory:25Mi]] prometheus-k8s-0 Container Name: prometheus resources: map[requests:map[cpu:70m memory:1Gi]] Container Name: prometheus-config-reloader resources: map[] Container Name: rules-configmap-reloader resources: map[] Container Name: thanos-sidecar resources: map[requests:map[cpu:1m memory:100Mi]] Container Name: prometheus-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] prometheus-k8s-1 Container Name: prometheus resources: map[requests:map[cpu:70m memory:1Gi]] Container Name: prometheus-config-reloader resources: map[] Container Name: rules-configmap-reloader resources: map[] Container Name: thanos-sidecar resources: map[requests:map[cpu:1m memory:100Mi]] Container Name: prometheus-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] prometheus-operator-6b756458d6-x2pr2 Container Name: prometheus-operator resources: map[requests:map[cpu:5m memory:60Mi]] telemeter-client-7f84cf7dcb-gf6k6 Container Name: telemeter-client resources: map[] Container Name: reload resources: map[] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:10m memory:20Mi]] thanos-querier-5948df967c-2g5mk Container Name: thanos-querier resources: map[requests:map[cpu:5m memory:12Mi]] Container Name: oauth-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] thanos-querier-5948df967c-h4vwk Container Name: thanos-querier resources: map[requests:map[cpu:5m memory:12Mi]] Container Name: oauth-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]]
Tested with 4.5.0-0.nightly-2020-03-22-175100, from PR 711, since no need to set cpu request for config-reloader container of alertmanager-main pod, other containers are all set reasonable cpu request, close this bug # for i in $(kubectl -n openshift-monitoring get po --no-headers | awk '{print $1}'); do echo $i; kubectl -n openshift-monitoring get pod $i -o go-template='{{range.spec.containers}}{{"Container Name: "}}{{.name}}{{"\r\nresources: "}}{{.resources}}{{"\n"}}{{end}}'; echo -e "\n"; done alertmanager-main-0 Container Name: alertmanager resources: map[requests:map[cpu:4m memory:200Mi]] Container Name: config-reloader resources: map[] Container Name: alertmanager-proxy resources: map[requests:map[cpu:1m memory:20Mi]] alertmanager-main-1 Container Name: alertmanager resources: map[requests:map[cpu:4m memory:200Mi]] Container Name: config-reloader resources: map[] Container Name: alertmanager-proxy resources: map[requests:map[cpu:1m memory:20Mi]] alertmanager-main-2 Container Name: alertmanager resources: map[requests:map[cpu:4m memory:200Mi]] Container Name: config-reloader resources: map[] Container Name: alertmanager-proxy resources: map[requests:map[cpu:1m memory:20Mi]] cluster-monitoring-operator-8448ddffd4-v5zx5 Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: cluster-monitoring-operator resources: map[requests:map[cpu:10m memory:50Mi]] grafana-69cff65d46-zm6sp Container Name: grafana resources: map[requests:map[cpu:4m memory:100Mi]] Container Name: grafana-proxy resources: map[requests:map[cpu:1m memory:20Mi]] kube-state-metrics-77d6884646-stl5r Container Name: kube-state-metrics resources: map[requests:map[cpu:2m memory:40Mi]] Container Name: kube-rbac-proxy-main resources: map[requests:map[cpu:1m memory:40Mi]] Container Name: kube-rbac-proxy-self resources: map[requests:map[cpu:1m memory:40Mi]] node-exporter-97kqq Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-hpslb Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-msdwk Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-ndcqk Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-r2xrj Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-tvbdp Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] openshift-state-metrics-f78c86754-ct9nc Container Name: kube-rbac-proxy-main resources: map[requests:map[cpu:10m memory:20Mi]] Container Name: kube-rbac-proxy-self resources: map[requests:map[cpu:10m memory:20Mi]] Container Name: openshift-state-metrics resources: map[requests:map[cpu:100m memory:150Mi]] prometheus-adapter-8f6856657-bjx9r Container Name: prometheus-adapter resources: map[requests:map[cpu:1m memory:25Mi]] prometheus-adapter-8f6856657-r548t Container Name: prometheus-adapter resources: map[requests:map[cpu:1m memory:25Mi]] prometheus-k8s-0 Container Name: prometheus resources: map[requests:map[cpu:70m memory:1Gi]] Container Name: prometheus-config-reloader resources: map[requests:map[cpu:1m]] Container Name: rules-configmap-reloader resources: map[requests:map[cpu:1m]] Container Name: thanos-sidecar resources: map[requests:map[cpu:1m memory:100Mi]] Container Name: prometheus-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] prometheus-k8s-1 Container Name: prometheus resources: map[requests:map[cpu:70m memory:1Gi]] Container Name: prometheus-config-reloader resources: map[requests:map[cpu:1m]] Container Name: rules-configmap-reloader resources: map[requests:map[cpu:1m]] Container Name: thanos-sidecar resources: map[requests:map[cpu:1m memory:100Mi]] Container Name: prometheus-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] prometheus-operator-6b756458d6-kd25r Container Name: prometheus-operator resources: map[requests:map[cpu:5m memory:60Mi]] telemeter-client-6bcb786cc4-m6nbv Container Name: telemeter-client resources: map[requests:map[cpu:1m]] Container Name: reload resources: map[requests:map[cpu:1m]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] thanos-querier-f7945ccc4-bqkr4 Container Name: thanos-querier resources: map[requests:map[cpu:5m memory:12Mi]] Container Name: oauth-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] thanos-querier-f7945ccc4-f9h57 Container Name: thanos-querier resources: map[requests:map[cpu:5m memory:12Mi]] Container Name: oauth-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]]
Tested with 4.5.0-0.nightly-2020-04-09-192237, no cpu request for config-reloader container of alertmanager-main pod, other containers are all set reasonable cpu request # for i in $(kubectl -n openshift-monitoring get po --no-headers | awk '{print $1}'); do echo $i; kubectl -n openshift-monitoring get pod $i -o go-template='{{range.spec.containers}}{{"Container Name: "}}{{.name}}{{"\r\nresources: "}}{{.resources}}{{"\n"}}{{end}}'; echo -e "\n"; done alertmanager-main-0 Container Name: alertmanager resources: map[requests:map[cpu:4m memory:200Mi]] Container Name: config-reloader resources: map[] Container Name: alertmanager-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] alertmanager-main-1 Container Name: alertmanager resources: map[requests:map[cpu:4m memory:200Mi]] Container Name: config-reloader resources: map[] Container Name: alertmanager-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] alertmanager-main-2 Container Name: alertmanager resources: map[requests:map[cpu:4m memory:200Mi]] Container Name: config-reloader resources: map[] Container Name: alertmanager-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] cluster-monitoring-operator-769df6cc9c-4nv4j Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: cluster-monitoring-operator resources: map[requests:map[cpu:10m memory:50Mi]] grafana-85d5f88875-jpvcd Container Name: grafana resources: map[requests:map[cpu:4m memory:100Mi]] Container Name: grafana-proxy resources: map[requests:map[cpu:1m memory:20Mi]] kube-state-metrics-6454674cd6-jmq5s Container Name: kube-state-metrics resources: map[requests:map[cpu:2m memory:40Mi]] Container Name: kube-rbac-proxy-main resources: map[requests:map[cpu:1m memory:40Mi]] Container Name: kube-rbac-proxy-self resources: map[requests:map[cpu:1m memory:40Mi]] node-exporter-7sj6w Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-9vdp6 Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-jn8fp Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-jnc9v Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-w56bb Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-xpxzq Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] openshift-state-metrics-566d7bb966-7tl9k Container Name: kube-rbac-proxy-main resources: map[requests:map[cpu:10m memory:20Mi]] Container Name: kube-rbac-proxy-self resources: map[requests:map[cpu:10m memory:20Mi]] Container Name: openshift-state-metrics resources: map[requests:map[cpu:100m memory:150Mi]] prometheus-adapter-5464d75d77-5fp96 Container Name: prometheus-adapter resources: map[requests:map[cpu:1m memory:25Mi]] prometheus-adapter-5464d75d77-jtlc5 Container Name: prometheus-adapter resources: map[requests:map[cpu:1m memory:25Mi]] prometheus-k8s-0 Container Name: prometheus resources: map[requests:map[cpu:70m memory:1Gi]] Container Name: prometheus-config-reloader resources: map[requests:map[cpu:1m]] Container Name: rules-configmap-reloader resources: map[requests:map[cpu:1m]] Container Name: thanos-sidecar resources: map[requests:map[cpu:1m memory:100Mi]] Container Name: prometheus-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] prometheus-k8s-1 Container Name: prometheus resources: map[requests:map[cpu:70m memory:1Gi]] Container Name: prometheus-config-reloader resources: map[requests:map[cpu:1m]] Container Name: rules-configmap-reloader resources: map[requests:map[cpu:1m]] Container Name: thanos-sidecar resources: map[requests:map[cpu:1m memory:100Mi]] Container Name: prometheus-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] prometheus-operator-cc6d58bc7-7w98h Container Name: prometheus-operator resources: map[requests:map[cpu:5m memory:60Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:40Mi]] telemeter-client-79c49c874f-gwb2h Container Name: telemeter-client resources: map[requests:map[cpu:1m]] Container Name: reload resources: map[requests:map[cpu:1m]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] thanos-querier-64948bff88-dld9p Container Name: thanos-query resources: map[requests:map[cpu:5m memory:12Mi]] Container Name: oauth-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] thanos-querier-64948bff88-wlhxj Container Name: thanos-query resources: map[requests:map[cpu:5m memory:12Mi]] Container Name: oauth-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]]
Tested with 4.5.0-0.nightly-2020-04-15-211427, the cpu request for all containers are reasonable now $ for i in $(kubectl -n openshift-monitoring get po --no-headers | awk '{print $1}'); do echo $i; kubectl -n openshift-monitoring get pod $i -o go-template='{{range.spec.containers}}{{"Container Name: "}}{{.name}}{{"\r\nresources: "}}{{.resources}}{{"\n"}}{{end}}'; echo -e "\n"; done alertmanager-main-0 Container Name: alertmanager resources: map[requests:map[cpu:4m memory:200Mi]] Container Name: config-reloader resources: map[requests:map[cpu:1m]] Container Name: alertmanager-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] alertmanager-main-1 Container Name: alertmanager resources: map[requests:map[cpu:4m memory:200Mi]] Container Name: config-reloader resources: map[requests:map[cpu:1m]] Container Name: alertmanager-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] alertmanager-main-2 Container Name: alertmanager resources: map[requests:map[cpu:4m memory:200Mi]] Container Name: config-reloader resources: map[requests:map[cpu:1m]] Container Name: alertmanager-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] cluster-monitoring-operator-85577f6786-hmnq6 Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: cluster-monitoring-operator resources: map[requests:map[cpu:10m memory:50Mi]] grafana-588f74f599-bhv77 Container Name: grafana resources: map[requests:map[cpu:4m memory:100Mi]] Container Name: grafana-proxy resources: map[requests:map[cpu:1m memory:20Mi]] kube-state-metrics-76d9c575cb-vrzl7 Container Name: kube-state-metrics resources: map[requests:map[cpu:2m memory:40Mi]] Container Name: kube-rbac-proxy-main resources: map[requests:map[cpu:1m memory:40Mi]] Container Name: kube-rbac-proxy-self resources: map[requests:map[cpu:1m memory:40Mi]] node-exporter-fmx58 Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-ml7bl Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-rrpgw Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-svzzs Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-tmvbt Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] node-exporter-zznw9 Container Name: node-exporter resources: map[requests:map[cpu:8m memory:180Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:30Mi]] openshift-state-metrics-5d7b475b9d-jbkkd Container Name: kube-rbac-proxy-main resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy-self resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: openshift-state-metrics resources: map[requests:map[cpu:1m memory:150Mi]] prometheus-adapter-579648b54-4dkp9 Container Name: prometheus-adapter resources: map[requests:map[cpu:1m memory:25Mi]] prometheus-adapter-579648b54-tddxf Container Name: prometheus-adapter resources: map[requests:map[cpu:1m memory:25Mi]] prometheus-k8s-0 Container Name: prometheus resources: map[requests:map[cpu:70m memory:1Gi]] Container Name: prometheus-config-reloader resources: map[requests:map[cpu:1m]] Container Name: rules-configmap-reloader resources: map[requests:map[cpu:1m]] Container Name: thanos-sidecar resources: map[requests:map[cpu:1m memory:100Mi]] Container Name: prometheus-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] prometheus-k8s-1 Container Name: prometheus resources: map[requests:map[cpu:70m memory:1Gi]] Container Name: prometheus-config-reloader resources: map[requests:map[cpu:1m]] Container Name: rules-configmap-reloader resources: map[requests:map[cpu:1m]] Container Name: thanos-sidecar resources: map[requests:map[cpu:1m memory:100Mi]] Container Name: prometheus-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] prometheus-operator-7cc5cdd4b5-5p566 Container Name: prometheus-operator resources: map[requests:map[cpu:5m memory:60Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:40Mi]] telemeter-client-598f5c89-qw587 Container Name: telemeter-client resources: map[requests:map[cpu:1m]] Container Name: reload resources: map[requests:map[cpu:1m]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] thanos-querier-ccf54c68-6tdnl Container Name: thanos-query resources: map[requests:map[cpu:5m memory:12Mi]] Container Name: oauth-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]] thanos-querier-ccf54c68-rft8p Container Name: thanos-query resources: map[requests:map[cpu:5m memory:12Mi]] Container Name: oauth-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: kube-rbac-proxy resources: map[requests:map[cpu:1m memory:20Mi]] Container Name: prom-label-proxy resources: map[requests:map[cpu:1m memory:20Mi]]
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409