Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1916624

Summary: CSV alerts inaccurate
Product: OpenShift Container Platform Reporter: Rick Rackow <rrackow>
Component: OLMAssignee: Kevin Rizza <krizza>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED WONTFIX Docs Contact:
Severity: low    
Priority: low CC: ankithom, bandrade, cblecker, krizza
Version: 4.6.zKeywords: Reopened, ServiceDeliveryImpact
Target Milestone: ---Flags: ankithom: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-04-30 18:04:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rick Rackow 2021-01-15 09:57:42 UTC
Description of problem:

Alerts regarding CSV failures are not very accurate and not targeted to core namespaces

Comment 1 Kevin Rizza 2021-02-03 22:47:22 UTC
This bug report is quite vague and doesn't really have any actionable information in it. I'm closing it as INSUFFICIENT_DATA. Feel free to reopen with more explicit description or explanation of what the specific defect is or what is being asked for.

Comment 2 Rick Rackow 2021-02-08 16:11:42 UTC
CsvAbnormalReplacingOver30Min and CsvAbnormalReplacingOver4Hr should be added in order to get better insights into potentially bad behavior during CSV replacement.

Those alerts should additionally have the namespace present that they originate from to ensure a possibility to route them easily via alertmanager.

Comment 5 Rick Rackow 2021-07-13 16:00:32 UTC
This has been fixed on master, but still an issue on 4.6 and 4.7

Comment 9 Bruno Andrade 2021-10-05 16:35:34 UTC
LGTM, marking as VERIFIED.

OCP Version: 4.10.0-0.nightly-2021-10-05-121338

OLM version: 0.18.3
git commit: a768ef8e86e00e25fa8612dbf9f6984721449255



oc get prometheusrules.monitoring.coreos.com olm-alert-rules -n openshift-operator-lifecycle-manager -o yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
  creationTimestamp: "2021-10-05T15:56:47Z"
  generation: 1
  labels:
    prometheus: alert-rules
    role: alert-rules
  name: olm-alert-rules
  namespace: openshift-operator-lifecycle-manager
  ownerReferences:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    name: version
    uid: 28bbc3d2-a454-4187-a57d-c0a07d220a76
  resourceVersion: "1757"
  uid: eabba1ca-69a0-4a8f-8999-1640d5fe72e3
spec:
  groups:
  - name: olm.csv_abnormal.rules
    rules:
    - alert: CsvAbnormalFailedOver2Min
      annotations:
        message: Failed to install Operator {{ $labels.name }} version {{ $labels.version
          }}. Reason-{{ $labels.reason }}
      expr: csv_abnormal{phase=~"^Failed$"}
      for: 2m
      labels:
        namespace: '{{ $labels.namespace }}'
        severity: warning
    - alert: CsvAbnormalOver30Min
      annotations:
        message: Failed to install Operator {{ $labels.name }} version {{ $labels.version
          }}. Phase-{{ $labels.phase }} Reason-{{ $labels.reason }}
      expr: csv_abnormal{phase=~"(^Replacing$|^Pending$|^Deleting$|^Unknown$)"}
      for: 30m
      labels:
        namespace: '{{ $labels.namespace }}'
        severity: warning
  - name: olm.installplan.rules
    rules:
    - alert: InstallPlanStepAppliedWithWarnings
      annotations:
        message: The API server returned a warning during installation or upgrade
          of an operator. An Event with reason "AppliedWithWarnings" has been created
          with complete details, including a reference to the InstallPlan step that
          generated the warning.
      expr: sum(increase(installplan_warnings_total[5m])) > 0
      labels:
        severity: warning

Comment 11 Rory Thrasher 2024-04-30 18:04:53 UTC
OCP is no longer using Bugzilla and this bug appears to have been left in an orphaned state. If the bug is still relevant, please open a new issue in the OCPBUGS Jira project: https://issues.redhat.com/projects/OCPBUGS/summary