Bug 1861631
Summary: | Result for kubelet_running_pod_count is wrong | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | |
Component: | Node | Assignee: | Seth Jennings <sjenning> | |
Status: | CLOSED ERRATA | QA Contact: | Sunil Choudhary <schoudha> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 4.6 | CC: | alegrand, anpicker, aos-bugs, erooth, jokerman, kakkoyun, lcosic, mloibl, pkrupa, sjenning, surbania | |
Target Milestone: | --- | |||
Target Release: | 4.6.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1862194 1862195 1862197 (view as bug list) | Environment: | ||
Last Closed: | 2020-10-27 16:21:16 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1862194 |
Description
Junqi Zhao
2020-07-29 06:18:29 UTC
Interesting! Seems like kube_pod_info has the correct information, but not kubelet_running_pod_count. There is an upstream issue for this already https://github.com/kubernetes/kubernetes/issues/81412 and https://github.com/kubernetes/kubernetes/pull/92187 that Pawel did so reasoning to him. Alerting based on this metric is fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1846805. I'll leave it open as upstream fix is not yet finished in https://github.com/kubernetes/kubernetes/pull/92187 Reducing severity as alerts have a workaround. Upstream fix in https://github.com/kubernetes/kubernetes/pull/85983 seems to be progressing and already lgtm'd. Reassigning to node team to shepherd porting fix into OpenShift. regressed upstream in 1.16 https://github.com/kubernetes/kubernetes/pull/85983/files#r424681852 https://github.com/kubernetes/kubernetes/commit/c02d49d775b4dc960f52af1f5295642c07947ca7 Verified on 4.6.0-0.nightly-2020-08-18-165040, $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-18-165040 True False 3h26m Cluster version is 4.6.0-0.nightly-2020-08-18-165040 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-146-213.us-east-2.compute.internal Ready worker 3h40m v1.19.0-rc.2+99cb93a-dirty ip-10-0-149-17.us-east-2.compute.internal Ready master 3h50m v1.19.0-rc.2+99cb93a-dirty ip-10-0-183-120.us-east-2.compute.internal Ready worker 3h40m v1.19.0-rc.2+99cb93a-dirty ip-10-0-185-235.us-east-2.compute.internal Ready master 3h50m v1.19.0-rc.2+99cb93a-dirty ip-10-0-212-255.us-east-2.compute.internal Ready worker 3h40m v1.19.0-rc.2+99cb93a-dirty ip-10-0-218-126.us-east-2.compute.internal Ready master 3h50m v1.19.0-rc.2+99cb93a-dirty $ oc get pod --all-namespaces -o wide | grep "ip-10-0-149-17.us-east-2.compute.internal" | grep Running | wc -l 21 $ oc get pod --all-namespaces -o wide | grep "ip-10-0-149-17.us-east-2.compute.internal" | grep Completed | wc -l 4 $ token=`oc sa get-token prometheus-k8s -n openshift-monitoring` $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -g -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=kubelet_running_pods{node="ip-10-0-149-17.us-east-2.compute.internal"}' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 336 100 336 0 0 7205 0 --:--:-- --:--:-- --:--:-- 7304 { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "kubelet_running_pods", "endpoint": "https-metrics", "instance": "10.0.149.17:10250", "job": "kubelet", "metrics_path": "/metrics", "namespace": "kube-system", "node": "ip-10-0-149-17.us-east-2.compute.internal", "service": "kubelet" }, "value": [ 1597827031.686, "21" ] } ] } } Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |