Bug 1916624
| Summary: | CSV alerts inaccurate | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Rick Rackow <rrackow> |
| Component: | OLM | Assignee: | Kevin Rizza <krizza> |
| OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> |
| Status: | CLOSED WONTFIX | Docs Contact: | |
| Severity: | low | ||
| Priority: | low | CC: | ankithom, bandrade, cblecker, krizza |
| Version: | 4.6.z | Keywords: | Reopened, ServiceDeliveryImpact |
| Target Milestone: | --- | Flags: | ankithom:
needinfo-
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-04-30 18:04:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Rick Rackow
2021-01-15 09:57:42 UTC
This bug report is quite vague and doesn't really have any actionable information in it. I'm closing it as INSUFFICIENT_DATA. Feel free to reopen with more explicit description or explanation of what the specific defect is or what is being asked for. CsvAbnormalReplacingOver30Min and CsvAbnormalReplacingOver4Hr should be added in order to get better insights into potentially bad behavior during CSV replacement. Those alerts should additionally have the namespace present that they originate from to ensure a possibility to route them easily via alertmanager. This has been fixed on master, but still an issue on 4.6 and 4.7 LGTM, marking as VERIFIED.
OCP Version: 4.10.0-0.nightly-2021-10-05-121338
OLM version: 0.18.3
git commit: a768ef8e86e00e25fa8612dbf9f6984721449255
oc get prometheusrules.monitoring.coreos.com olm-alert-rules -n openshift-operator-lifecycle-manager -o yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
annotations:
include.release.openshift.io/ibm-cloud-managed: "true"
include.release.openshift.io/self-managed-high-availability: "true"
include.release.openshift.io/single-node-developer: "true"
creationTimestamp: "2021-10-05T15:56:47Z"
generation: 1
labels:
prometheus: alert-rules
role: alert-rules
name: olm-alert-rules
namespace: openshift-operator-lifecycle-manager
ownerReferences:
- apiVersion: config.openshift.io/v1
kind: ClusterVersion
name: version
uid: 28bbc3d2-a454-4187-a57d-c0a07d220a76
resourceVersion: "1757"
uid: eabba1ca-69a0-4a8f-8999-1640d5fe72e3
spec:
groups:
- name: olm.csv_abnormal.rules
rules:
- alert: CsvAbnormalFailedOver2Min
annotations:
message: Failed to install Operator {{ $labels.name }} version {{ $labels.version
}}. Reason-{{ $labels.reason }}
expr: csv_abnormal{phase=~"^Failed$"}
for: 2m
labels:
namespace: '{{ $labels.namespace }}'
severity: warning
- alert: CsvAbnormalOver30Min
annotations:
message: Failed to install Operator {{ $labels.name }} version {{ $labels.version
}}. Phase-{{ $labels.phase }} Reason-{{ $labels.reason }}
expr: csv_abnormal{phase=~"(^Replacing$|^Pending$|^Deleting$|^Unknown$)"}
for: 30m
labels:
namespace: '{{ $labels.namespace }}'
severity: warning
- name: olm.installplan.rules
rules:
- alert: InstallPlanStepAppliedWithWarnings
annotations:
message: The API server returned a warning during installation or upgrade
of an operator. An Event with reason "AppliedWithWarnings" has been created
with complete details, including a reference to the InstallPlan step that
generated the warning.
expr: sum(increase(installplan_warnings_total[5m])) > 0
labels:
severity: warning
OCP is no longer using Bugzilla and this bug appears to have been left in an orphaned state. If the bug is still relevant, please open a new issue in the OCPBUGS Jira project: https://issues.redhat.com/projects/OCPBUGS/summary |