Bug 2010341 - OpenShift Alerting Rules Style-Guide Compliance
Summary: OpenShift Alerting Rules Style-Guide Compliance
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Credential Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.10.0
Assignee: Nobody
QA Contact: Jianping SHu
URL:
Whiteboard:
Depends On:
Blocks: 1992563
TreeView+ depends on / blocked
 
Reported: 2021-10-04 13:39 UTC by Brad Ison
Modified: 2022-03-10 16:17 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:16:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cloud-credential-operator pull 395 0 None open Bug 2010341: update alerts with summary and descriptions 2021-10-05 20:02:25 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:16:59 UTC

Description Brad Ison 2021-10-04 13:39:23 UTC
Hello,

The OpenShift Monitoring Team has published a set guidelines for
writing alerting rules in OpenShift, including a basic style guide.
You can find these here:

  https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md
  https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide

A subset of these are now being enforced in OpenShift End-to-End
tests [1], with temporary exceptions for existing non-compliant rules.

This component was found to have the following issues:

* Alerts without summary and/or description annotations:

  - CloudCredentialOperatorDeprovisioningFailed
  - CredentialOperatorDeprovisioningFailed
  - CloudCredentialOperatorInsufficientCloudCreds
  - CloudCredentialOperatorProvisioningFailed
  - CloudCredentialOperatorTargetNamespaceMissing

Alerts MUST include summary and description annotations.

Think of summary as the first line of a commit message, or an email
subject line. It should be brief but informative. The description is
the longer, more detailed explanation of the alert.

The enhancement document linked above has examples of alerts with
these annotations.

Thank you!

Repo: openshift/cloud-credential-operator

[1]: https://github.com/openshift/origin/commit/097e7a6

Comment 1 Brad Ison 2021-10-05 13:39:31 UTC
Looks like a new alert that we missed before also needs the annotations:

  - CloudCredentialOperatorStaleCredentials

Comment 4 Jianping SHu 2021-10-09 08:03:15 UTC
Verified with 4.10.0-0.nightly-2021-10-09-022511

1. Login on prometheus webpage with openshift account. All CCO alerts are with description and summary informations.
https://prometheus-k8s-openshift-monitoring.apps.jshu-1009-test3.qe.devcluster.openshift.com/alerts

/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-cloud-credential-operator-cloud-credential-operator-alerts.yaml > CloudCredentialOperator
CloudCredentialOperatorTargetNamespaceMissing (0 active)
name: CloudCredentialOperatorTargetNamespaceMissing
expr: cco_credentials_requests_conditions{condition="MissingTargetNamespace"} > 0
for: 5m
labels:
severity: warning
annotations:
description: At least one CredentialsRequest custom resource has specified in its .spec.secretRef.namespace field a namespace which does not presently exist. This means the Cloud Credential Operator in the openshift-cloud-credential-operator namespace cannot process the CredentialsRequest resource. Check the conditions of all CredentialsRequests with 'oc get credentialsrequest -A' to find any CredentialsRequest(s) with a .status.condition showing a condition type of MissingTargetNamespace set to True.
message: CredentialsRequest(s) pointing to non-existent namespace
summary: One ore more CredentialsRequest CRs are asking to save credentials to a non-existent namespace.

CloudCredentialOperatorProvisioningFailed (0 active)
name: CloudCredentialOperatorProvisioningFailed
expr: cco_credentials_requests_conditions{condition="CredentialsProvisionFailure"} > 0
for: 5m
labels:
severity: warning
annotations:
description: While processing a CredentialsRequest, the Cloud Credential Operator encountered an issue. Check the conditions of all CredentialsRequets with 'oc get credentialsrequest -A' to find any CredentialsRequest(s) with a .stats.condition showing a condition type of CredentialsProvisionFailure set to True for more details on the issue.
message: CredentialsRequest(s) unable to be fulfilled
summary: One or more CredentialsRequest CRs are unable to be processed.

CloudCredentialOperatorDeprovisioningFailed (0 active)
name: CloudCredentialOperatorDeprovisioningFailed
expr: cco_credentials_requests_conditions{condition="CredentialsDeprovisionFailure"} > 0
for: 5m
labels:
severity: warning
annotations:
description: While processing a CredentialsRequest marked for deletion, the Cloud Credential Operator encountered an issue. Check the conditions of all CredentialsRequests with 'oc get credentialsrequest -A' to find any CredentialsRequest(s) with a .status.condition showing a condition type of CredentialsDeprovisionFailure set to True for more details on the issue.
message: CredentialsRequest(s) unable to be cleaned up
summary: One or more CredentialsRequest CRs are unable to be deleted.

CloudCredentialOperatorInsufficientCloudCreds (0 active)
name: CloudCredentialOperatorInsufficientCloudCreds
expr: cco_credentials_requests_conditions{condition="InsufficientCloudCreds"} > 0
for: 5m
labels:
severity: warning
annotations:
description: The Cloud Credential Operator has determined that there are insufficient permissions to process one or more CredentialsRequest CRs. Check the conditions of all CredentialsRequests with 'oc get credentialsrequest -A' to find any CredentialsRequest(s) with a .status.condition showing a condition type of InsufficientCloudCreds set to True for more details.
message: Cluster's cloud credentials insufficient for minting or passthrough
summary: Problem with the available platform credentials.

CloudCredentialOperatorStaleCredentials (0 active)
name: CloudCredentialOperatorStaleCredentials
expr: cco_credentials_requests_conditions{condition="StaleCredentials"} > 0
for: 5m
labels:
severity: warning
annotations:
description: The Cloud Credential Operator (CCO) has detected one or more stale CredentialsRequest CRs that need to be manually deleted. When the CCO is in Manual credentials mode, it will not automatially clean up stale CredentialsRequest CRs (that may no longer be necessary in the present version of OpenShift because it could involve needing to clean up manually created cloud resources. Check the conditions of all CredentialsRequests with 'oc get credentialsrequest -A' to find any CredentialsRequest(s) with a .status.condition showing a condition type of StaleCredentials set to True. Determine the appropriate steps to clean up/deprovision any previously provisioned cloud resources. Finally, delete the CredentialsRequest with an 'oc delete'.
message: 1 or more credentials requests are stale and should be deleted. Check the status.conditions on CredentialsRequest CRs to identify the stale one(s).
summary: One or more CredentialsRequest CRs are stale and should be deleted.


2. Create one CredentialsRequest with namespace doesn't exist, then alert CloudCredentialOperatorTargetNamespaceMissing is generated

apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  name: my-cred-request
  namespace: openshift-cloud-credential-operator
spec:
  secretRef:
    name: my-cred-request-secret
    namespace: namespace-does-not-exist
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: AWSProviderSpec
    statementEntries:
    - effect: Allow
      action:
      - s3:CreateBucket
      - s3:DeleteBucket
      resource: "*" 

3. Change CCO to "Manual" mode and create a CredentialsRequest with the namespace/name of openshift-cloud-credential-operator/cloud-credential-operator-s3, then alert CloudCredentialOperatorStaleCredentials is generated

apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  name: cloud-credential-operator-s3
  namespace: openshift-cloud-credential-operator
  annotations:
    exclude.release.openshift.io/internal-openshift-hosted: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
spec:
  secretRef:
    name: cloud-credential-operator-s3-creds
    namespace: openshift-cloud-credential-operator
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: AWSProviderSpec
    statementEntries:
    - effect: Allow
      action:
      - s3:CreateBucket
      - s3:PutBucketTagging
      - s3:PutObject
      - s3:PutObjectAcl
      resource: "*"

Comment 8 errata-xmlrpc 2022-03-10 16:16:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.