Bug 1872337
Summary: | KubeletHealthState alert keeps firing [4.5.z] | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> | ||||
Component: | Node | Assignee: | Seth Jennings <sjenning> | ||||
Status: | CLOSED ERRATA | QA Contact: | Sunil Choudhary <schoudha> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 4.3.z | CC: | alegrand, anpicker, aos-bugs, apjagtap, erooth, ggore, jokerman, kakkoyun, kgarriso, lcosic, mloibl, nnosenzo, palonsor, pkrupa, rrackow, sjenning, surbania, wking | ||||
Target Milestone: | --- | Flags: | palonsor:
needinfo-
|
||||
Target Release: | 4.5.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Fixes an issue where once the KubeletHealthState fires from a particular kubelet, it stays active even after the kubelet becomes healthy again. Only a kubelet restart could clear the alert.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-09-21 17:42:05 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1854009 | ||||||
Bug Blocks: | 1878798 | ||||||
Attachments: |
|
Comment 1
Seth Jennings
2020-08-25 17:06:23 UTC
Verified on 4.5.0-0.nightly-2020-09-12-063044. Stopped kubelet service on one node and see alerts firing and they were cleared once kubelet was started again. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-09-12-063044 True False 144m Cluster version is 4.5.0-0.nightly-2020-09-12-063044 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-237.us-east-2.compute.internal Ready worker 154m v1.18.3+b0068a8 ip-10-0-152-249.us-east-2.compute.internal Ready master 165m v1.18.3+b0068a8 ip-10-0-167-191.us-east-2.compute.internal Ready master 166m v1.18.3+b0068a8 ip-10-0-190-64.us-east-2.compute.internal Ready worker 154m v1.18.3+b0068a8 ip-10-0-200-110.us-east-2.compute.internal Ready master 166m v1.18.3+b0068a8 ip-10-0-220-35.us-east-2.compute.internal Ready worker 154m v1.18.3+b0068a8 $ token=`oc sa get-token prometheus-k8s -n openshift-monitoring` $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -g -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=mcd_kubelet_state>2' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 334 100 334 0 0 4517 0 --:--:-- --:--:-- --:--:-- 4575 { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "mcd_kubelet_state", "endpoint": "metrics", "instance": "10.0.130.237:9001", "job": "machine-config-daemon", "namespace": "openshift-machine-config-operator", "pod": "machine-config-daemon-xqddl", "service": "machine-config-daemon" }, "value": [ 1600077068.81, "3" ] } ] } } $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -g -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=mcd_kubelet_state>2' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 63 100 63 0 0 768 0 --:--:-- --:--:-- --:--:-- 777 { "status": "success", "data": { "resultType": "vector", "result": [] } } Created attachment 1714740 [details]
metrics-fix-confirmation
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.11 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3719 |