Bug 2004051
Summary: | CMO can report as being Degraded while node-exporter is deployed on all nodes | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Simon Pasquier <spasquie> |
Component: | Monitoring | Assignee: | Prashant Balachandran <pnair> |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
Severity: | medium | Docs Contact: | Brian Burt <bburt> |
Priority: | unspecified | ||
Version: | 4.9 | CC: | amuller, anpicker, aos-bugs, bburt, erooth |
Target Milestone: | --- | ||
Target Release: | 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Previously, if the number of daemon set pods for the `node-exporter` agent was not equal to the number of nodes in the cluster, the Cluster Monitoring Operator (CMO) would report a condition of `degraded`. This issue would occur when one of the nodes was not in the `ready` condition.
This release now verifies that the number of daemon set pods for the `node-exporter` agent is not less than the number of ready nodes in the cluster. This process ensures that a `node-exporter` pod is running on every active node.
As a result, the CMO will not report a degraded condition if one of the nodes is not in a ready state.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-03-10 16:10:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Simon Pasquier
2021-09-14 12:31:37 UTC
checked with 4.10.0-0.nightly-2021-09-26-233013, stop kubelet service for one node and watched for 20 minutes, does not report degraded=true for CMO with the fix # oc get node | grep NotReady ip-10-0-140-4.us-east-2.compute.internal NotReady worker 6h25m v1.22.0-rc.0+af080cb # oc -n openshift-monitoring get ds node-exporter NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE node-exporter 6 6 5 6 5 kubernetes.io/os=linux 115m # oc -n openshift-monitoring get ds node-exporter -oyaml ... status: currentNumberScheduled: 6 desiredNumberScheduled: 6 numberAvailable: 5 numberMisscheduled: 0 numberReady: 5 numberUnavailable: 1 observedGeneration: 1 updatedNumberScheduled: 6 # oc get co monitoring -oyaml ... - lastTransitionTime: "2021-09-27T07:50:14Z" message: 'Prometheus is running without persistent storage which can lead to data loss during upgrades and cluster disruptions. Please refer to the official documentation to see how to configure storage for Prometheus: https://docs.openshift.com/container-platform/4.8/monitoring/configuring-the-monitoring-stack.html' reason: PrometheusDataPersistenceNotConfigured status: "False" type: Degraded Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |