Verified on 4.4.0-0.nightly-2020-09-20-175714. Stopped kubelet service on one node and see alerts firing and they were cleared once kubelet was started again. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-09-20-175714 True False 57m Cluster version is 4.4.0-0.nightly-2020-09-20-175714 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-134-125.us-east-2.compute.internal Ready master 90m v1.17.1+809da40 ip-10-0-150-228.us-east-2.compute.internal NotReady worker 78m v1.17.1+809da40 ip-10-0-182-191.us-east-2.compute.internal Ready worker 78m v1.17.1+809da40 ip-10-0-187-120.us-east-2.compute.internal Ready master 90m v1.17.1+809da40 ip-10-0-200-13.us-east-2.compute.internal Ready master 90m v1.17.1+809da40 ip-10-0-205-216.us-east-2.compute.internal Ready worker 78m v1.17.1+809da40 $ token=`oc sa get-token prometheus-k8s -n openshift-monitoring` $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -g -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=mcd_kubelet_state>2' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 334 100 334 0 0 6769 0 --:--:-- --:--:-- --:--:-- 6816 { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "mcd_kubelet_state", "endpoint": "metrics", "instance": "10.0.150.228:9001", "job": "machine-config-daemon", "namespace": "openshift-machine-config-operator", "pod": "machine-config-daemon-th47l", "service": "machine-config-daemon" }, "value": [ 1600757954.48, "3" ] } ] } } $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-134-125.us-east-2.compute.internal Ready master 91m v1.17.1+809da40 ip-10-0-150-228.us-east-2.compute.internal Ready worker 79m v1.17.1+809da40 ip-10-0-182-191.us-east-2.compute.internal Ready worker 79m v1.17.1+809da40 ip-10-0-187-120.us-east-2.compute.internal Ready master 91m v1.17.1+809da40 ip-10-0-200-13.us-east-2.compute.internal Ready master 91m v1.17.1+809da40 ip-10-0-205-216.us-east-2.compute.internal Ready worker 79m v1.17.1+809da40 $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -g -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=mcd_kubelet_state>2' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 63 100 63 0 0 1164 0 --:--:-- --:--:-- --:--:-- 1188 { "status": "success", "data": { "resultType": "vector", "result": [] } }
Created attachment 1715639 [details] metrics-fix-confirmation
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.4.26 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3764