Bug 1978662
Summary: | monitoring operator needs to indicate non-durable data | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | David Eads <deads> |
Component: | Monitoring | Assignee: | Filip Petkovski <fpetkovs> |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.9 | CC: | amuller, anpicker, aos-bugs, erooth, spasquie |
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: |
The Cluster Monitoring Operator will now set a message for the Degraded condition when persistent storage is not configured for Prometheus.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-10-18 17:38:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Eads
2021-07-02 12:41:05 UTC
I would vote for option 1. I mean I know a lot of customers that are straight ignoring "info" alerts anyways and I would argue the usefulness because of that. To add to Christian's comment, if the alert is to understand how many clusters use persistent storage for prometheus/alertmanager, we can create a telemetry metric to record this information. > To add to Christian's comment, if the alert is to understand how many clusters use persistent storage for prometheus/alertmanager, we can create a telemetry metric to record this information.
That is the first goal of that alert. Depending on how many are in this situation, we can decide what to do next. Losing historical metrics data is problem.
In the PR linked with this BZ we set a `PrometheusDataPersistanceNotConfigured` reason for the degraded condition when there is no metrics storage. All operator conditions are already exported to telemetry, so we will be able to see how many clusters are in this state. tested with 4.9.0-0.nightly-2021-07-20-221331, no persistent volumes for monitoring # oc -n openshift-monitoring get pvc No resources found in openshift-monitoring namespace. # oc get co monitoring -oyaml ... - lastTransitionTime: "2021-07-21T02:06:17Z" message: 'Prometheus is running without persistent storage which can lead to data loss during upgrades and cluster disruptions. Please refer to the official documentation to see how to configure storage for Prometheus: https://docs.openshift.com/container-platform/4.8/monitoring/configuring-the-monitoring-stack.html' reason: PrometheusDataPersistenceNotConfigured status: "False" type: Degraded the doc links to 4.8, since the completion time for doc is very near the GA date, use a previous version is fine. also tested with bind PVCs for monitoring, no warn message # oc get co monitoring -oyaml ... status: conditions: - lastTransitionTime: "2021-07-21T01:57:30Z" reason: AsExpected status: "True" type: Upgradeable - lastTransitionTime: "2021-07-21T02:06:17Z" message: Successfully rolled out the stack. reason: RollOutDone status: "True" type: Available - lastTransitionTime: "2021-07-21T02:06:17Z" status: "False" type: Progressing - lastTransitionTime: "2021-07-21T02:59:58Z" status: "False" type: Degraded Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |