Bug 1986375

Summary: Avoid CMO being degraded when some nodes aren't available
Product: OpenShift Container Platform Reporter: Prashant Balachandran <pnair>
Component: MonitoringAssignee: Prashant Balachandran <pnair>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.9CC: amuller, anpicker, aos-bugs, arajkuma, erooth, jgato
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:41:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Prashant Balachandran 2021-07-27 12:04:02 UTC
Description of problem:
node_exporter that can't be running on nodes that are offline/unavailable is one of the top reasons why CMO goes degraded. It would make sense to have CMO correlate the number of running node_exporter pods with the status of the nodes and not go degraded if the node_exporter pods are running on all nodes which are ready. As an example, if the cluster has N nodes with one node being not ready and (N-1) node_exporter pods are running then CMO should report Available rather than Degraded.

Version-Release number of selected component (if applicable):


How reproducible:
Always when nodes are offline.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Junqi Zhao 2021-07-29 08:19:43 UTC
tested with 4.9.0-0.nightly-2021-07-28-181504, monitoring won't be reported as DEGRADED due to offline/unavailable nodes, steps see below
set one node to SchedulingDisabled to not affect other pods
# oc adm cordon ip-10-0-217-156.us-east-2.compute.internal
# oc get node ip-10-0-217-156.us-east-2.compute.internal
NAME                                         STATUS                     ROLES    AGE     VERSION
ip-10-0-217-156.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   4h45m   v1.21.1+8268f88

scale down cluster-version-operator/cluster-monitoring-operator,remove daemonset
# oc -n openshift-cluster-version scale deploy cluster-version-operator --replicas=0
# oc -n openshift-monitoring scale deploy cluster-monitoring-operator --replicas=0
# oc -n openshift-monitoring delete daemonset node-exporter

make sure other pods are normal
# oc -n openshift-monitoring get pod -o wide
NAME                                       READY   STATUS    RESTARTS   AGE     IP             NODE                                         NOMINATED NODE   READINESS GATES
alertmanager-main-0                        5/5     Running   0          5h8m    10.129.2.8     ip-10-0-134-137.us-east-2.compute.internal   <none>           <none>
alertmanager-main-1                        5/5     Running   0          25m     10.129.2.16    ip-10-0-134-137.us-east-2.compute.internal   <none>           <none>
alertmanager-main-2                        5/5     Running   0          5h8m    10.131.0.17    ip-10-0-176-74.us-east-2.compute.internal    <none>           <none>
grafana-6c679c5748-vct2g                   2/2     Running   0          5h8m    10.129.2.9     ip-10-0-134-137.us-east-2.compute.internal   <none>           <none>
kube-state-metrics-59f44f65fb-qgghv        3/3     Running   0          5h12m   10.131.0.15    ip-10-0-176-74.us-east-2.compute.internal    <none>           <none>
openshift-state-metrics-78c5465bcd-bkndb   3/3     Running   0          5h12m   10.131.0.7     ip-10-0-176-74.us-east-2.compute.internal    <none>           <none>
prometheus-adapter-7d6b95dd6-cbv7h         1/1     Running   0          58m     10.131.0.122   ip-10-0-176-74.us-east-2.compute.internal    <none>           <none>
prometheus-adapter-7d6b95dd6-zgflb         1/1     Running   0          5h8m    10.129.2.7     ip-10-0-134-137.us-east-2.compute.internal   <none>           <none>
prometheus-k8s-0                           7/7     Running   0          26m     10.131.0.137   ip-10-0-176-74.us-east-2.compute.internal    <none>           <none>
prometheus-k8s-1                           7/7     Running   0          5h7m    10.129.2.11    ip-10-0-134-137.us-east-2.compute.internal   <none>           <none>
prometheus-operator-cd5899dbc-trcpx        2/2     Running   1          5h14m   10.128.0.40    ip-10-0-176-171.us-east-2.compute.internal   <none>           <none>
telemeter-client-567dc564fd-pvpcp          3/3     Running   0          5h12m   10.131.0.16    ip-10-0-176-74.us-east-2.compute.internal    <none>           <none>
thanos-querier-865d44b845-58cnf            5/5     Running   0          4h11m   10.129.2.13    ip-10-0-134-137.us-east-2.compute.internal   <none>           <none>
thanos-querier-865d44b845-hrxvb            5/5     Running   0          4h11m   10.131.0.40    ip-10-0-176-74.us-east-2.compute.internal    <none>           <none>

use following script to stop kubelet and sleep for 20m then start again
# oc debug node/ip-10-0-217-156.us-east-2.compute.internal
sh-4.4# chroot /host
sh-4.4# chmod +x /tmp/run.sh
sh-4.4# /tmp/run.sh &

cat /tmp/run.sh
**************
systemctl stop kubelet
sleep 20m
systemctl start kubelet
**************

scale up cluster-version-operator/cluster-monitoring-operator
# oc -n openshift-cluster-version scale deploy cluster-version-operator --replicas=1
# oc -n openshift-monitoring scale deploy cluster-monitoring-operator --replicas=1

# oc get node ip-10-0-217-156.us-east-2.compute.internal
NAME                                         STATUS                        ROLES    AGE     VERSION
ip-10-0-217-156.us-east-2.compute.internal   NotReady,SchedulingDisabled   worker   5h12m   v1.21.1+8268f88

make sure only the node-exporter pod is abornal which scheduled on the NotReady node
# oc -n openshift-monitoring get pod -o wide | grep -Ev "Running|Completed"
NAME                                           READY   STATUS    RESTARTS   AGE     IP             NODE                                         NOMINATED NODE   READINESS GATES
node-exporter-l884h                            0/2     Pending   0          2m58s   <none>         ip-10-0-217-156.us-east-2.compute.internal   <none>           <none>

watch for a while, monitoring won't be reported as DEGRADED
# node="ip-10-0-217-156.us-east-2.compute.internal"; while true; do oc get node ${node}; oc -n openshift-monitoring get pod -o wide | grep node-exporter | grep ${node}; oc get co monitoring; oc -n openshift-monitoring get ds;sleep 20s; done
...
NAME                                         STATUS                        ROLES    AGE     VERSION
ip-10-0-217-156.us-east-2.compute.internal   NotReady,SchedulingDisabled   worker   5h19m   v1.21.1+8268f88
node-exporter-l884h                            0/2     Pending   0          7m32s   <none>         ip-10-0-217-156.us-east-2.compute.internal   <none>           <none>
NAME         VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
monitoring   4.9.0-0.nightly-2021-07-28-181504   True        False         False      35m     
NAME            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
node-exporter   6         6         5       6            5           kubernetes.io/os=linux   7m42s
...

Comment 9 errata-xmlrpc 2021-10-18 17:41:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 10 Jose Gato 2022-11-04 09:07:04 UTC
we have found the same issue on 4.8.44. Was this backported to 4.8?
thanks,