Bug 2010375 - OpenShift Alerting Rules Style-Guide Compliance
Summary: OpenShift Alerting Rules Style-Guide Compliance
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.10
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.12.0
Assignee: Anik
QA Contact: xzha
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-04 14:06 UTC by Brad Ison
Modified: 2023-01-17 19:46 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:46:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-marketplace pull 469 0 None open Bug 2010375: Clarify reason/steps to diagnose in *CatalogError prom alert 2022-06-08 21:35:23 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:46:57 UTC

Description Brad Ison 2021-10-04 14:06:25 UTC
Hello,

The OpenShift Monitoring Team has published a set guidelines for
writing alerting rules in OpenShift, including a basic style guide.
You can find these here:

  https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md
  https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide

A subset of these are now being enforced in OpenShift End-to-End
tests [1], with temporary exceptions for existing non-compliant rules.

This component was found to have the following issues:

* Alerts without summary and/or description annotations:

  - CertifiedOperatorsCatalogError
  - CommunityOperatorsCatalogError
  - RedhatMarketplaceCatalogError
  - RedhatOperatorsCatalogError

Alerts MUST include summary and description annotations.

Think of summary as the first line of a commit message, or an email
subject line. It should be brief but informative. The description is
the longer, more detailed explanation of the alert.

The enhancement document linked above has examples of alerts with
these annotations.

Thank you!

Repo: operator-framework/operator-marketplace

[1]: https://github.com/openshift/origin/commit/097e7a6

Comment 2 Simon Pasquier 2022-06-03 14:36:33 UTC
Any progress on this issue? The monitoring team could help if needed.

Comment 3 xzha 2022-06-13 07:06:14 UTC
verify:

1) install cluster with this PR
zhaoxia@xzha-mac openshift-tests-private % oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.ci.test-2022-06-13-054554-ci-ln-zn13fyk-latest   True        False         39m     Cluster version is 4.11.0-0.ci.test-2022-06-13-054554-ci-ln-zn13fyk-latest

zhaoxia@xzha-mac openshift-tests-private % oc get catsrc
NAME                  DISPLAY               TYPE   PUBLISHER   AGE
certified-operators   Certified Operators   grpc   Red Hat     61m
community-operators   Community Operators   grpc   Red Hat     61m
redhat-marketplace    Red Hat Marketplace   grpc   Red Hat     61m
redhat-operators      Red Hat Operators     grpc   Red Hat     61m

2) make catsrc certified-operators pod is pending
oc patch catsrc certified-operators  -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
zhaoxia@xzha-mac openshift-tests-private % oc get pod   
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-7sns9               0/1     Pending   0          8s
community-operators-8kqfv               1/1     Running   0          40m
marketplace-operator-85d9b67789-g7tjw   1/1     Running   0          43m
redhat-marketplace-bcc47                1/1     Running   0          40m
redhat-operators-w6cbn                  1/1     Running   0          40m

3) check alert
zhaoxia@xzha-mac openshift-tests-private % curl -k -H "Authorization: Bearer $(oc create token prometheus-k8s -n openshift-monitoring)" https://$(oc get route prometheus-k8s -n openshift-monitoring -o=jsonpath='{.spec.host}')/api/v1/alerts| jq -r '.data.alerts[] | select (.labels.alertname == "OperatorHubSourceError")'
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.25:8443",
    "job": "catalog-operator-metrics",
    "name": "certified-operators",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-6468cdd79f-4gnqb",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the certified-operators source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by certified-operators source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=certified-operators) to diagnose and repair.",
    "summary": "The certified-operators source is in non-ready state for more than 10 minutes."
  },
  "state": "firing",
  "activeAt": "2022-06-13T06:38:33.120882761Z",
  "value": "0e+00"
}

4) stop other catalogs
oc patch catsrc community-operators  -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
oc patch catsrc redhat-marketplace   -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
oc patch catsrc redhat-operators  -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge

5) check alert
zhaoxia@xzha-mac openshift-tests-private % curl -k -H "Authorization: Bearer $(oc create token prometheus-k8s -n openshift-monitoring)" https://$(oc get route prometheus-k8s -n openshift-monitoring -o=jsonpath='{.spec.host}')/api/v1/alerts| jq -r '.data.alerts[] | select (.labels.alertname == "OperatorHubSourceError")'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10566    0 10566    0     0   7908      0 --:--:--  0:00:01 --:--:--  7998
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.25:8443",
    "job": "catalog-operator-metrics",
    "name": "certified-operators",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-6468cdd79f-4gnqb",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the certified-operators source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by certified-operators source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=certified-operators) to diagnose and repair.",
    "summary": "The certified-operators source is in non-ready state for more than 10 minutes."
  },
  "state": "firing",
  "activeAt": "2022-06-13T06:38:33.120882761Z",
  "value": "0e+00"
}
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.25:8443",
    "job": "catalog-operator-metrics",
    "name": "community-operators",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-6468cdd79f-4gnqb",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the community-operators source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by community-operators source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=community-operators) to diagnose and repair.",
    "summary": "The community-operators source is in non-ready state for more than 10 minutes."
  },
  "state": "firing",
  "activeAt": "2022-06-13T06:51:33.120882761Z",
  "value": "0e+00"
}
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.25:8443",
    "job": "catalog-operator-metrics",
    "name": "redhat-marketplace",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-6468cdd79f-4gnqb",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the redhat-marketplace source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by redhat-marketplace source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=redhat-marketplace) to diagnose and repair.",
    "summary": "The redhat-marketplace source is in non-ready state for more than 10 minutes."
  },
  "state": "pending",
  "activeAt": "2022-06-13T06:53:33.120882761Z",
  "value": "0e+00"
}
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.25:8443",
    "job": "catalog-operator-metrics",
    "name": "redhat-operators",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-6468cdd79f-4gnqb",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the redhat-operators source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by redhat-operators source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=redhat-operators) to diagnose and repair.",
    "summary": "The redhat-operators source is in non-ready state for more than 10 minutes."
  },
  "state": "pending",
  "activeAt": "2022-06-13T06:54:03.120882761Z",
  "value": "0e+00"
}


LGTM, verified.

Comment 7 xzha 2022-07-25 05:42:23 UTC
verify:
zhaoxia@xzha-mac ~ % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-07-24-180529   True        False         38m     Cluster version is 4.12.0-0.nightly-2022-07-24-180529
zhaoxia@xzha-mac ~ % oc get catsrc -A
NAMESPACE               NAME                  DISPLAY               TYPE   PUBLISHER   AGE
openshift-marketplace   certified-operators   Certified Operators   grpc   Red Hat     56m
openshift-marketplace   community-operators   Community Operators   grpc   Red Hat     56m
openshift-marketplace   redhat-marketplace    Red Hat Marketplace   grpc   Red Hat     56m
openshift-marketplace   redhat-operators      Red Hat Operators     grpc   Red Hat     56m

1) make catsrc pod is pending
zhaoxia@xzha-mac ~ % oc patch catsrc certified-operators  -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
catalogsource.operators.coreos.com/certified-operators patched
zhaoxia@xzha-mac ~ % oc patch catsrc community-operators -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
catalogsource.operators.coreos.com/community-operators patched
zhaoxia@xzha-mac ~ % oc patch catsrc redhat-marketplace -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
catalogsource.operators.coreos.com/redhat-marketplace patched
zhaoxia@xzha-mac ~ % oc patch catsrc  redhat-operators  -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
catalogsource.operators.coreos.com/redhat-operators patched

zhaoxia@xzha-mac ~ % oc get pod
NAME                                   READY   STATUS    RESTARTS      AGE
certified-operators-szdsx              0/1     Pending   0             2m28s
community-operators-q5hn5              0/1     Pending   0             2m18s
marketplace-operator-bbbc9755c-lpkpr   1/1     Running   5 (44m ago)   62m
redhat-marketplace-dhzx4               0/1     Pending   0             2m7s
redhat-operators-lhdw7                 0/1     Pending   0             117s

2) check alert
zhaoxia@xzha-mac ~ % curl -k -H "Authorization: Bearer $(oc create token prometheus-k8s -n openshift-monitoring)" https://$(oc get route prometheus-k8s -n openshift-monitoring -o=jsonpath='{.spec.host}')/api/v1/alerts| jq -r '.data.alerts[] | select (.labels.alertname == "OperatorHubSourceError")'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4305    0  4305    0     0   4527      0 --:--:-- --:--:-- --:--:--  4565
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.20:8443",
    "job": "catalog-operator-metrics",
    "name": "certified-operators",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-98ccdfbfc-d9bqf",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the certified-operators source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by certified-operators source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=certified-operators) to diagnose and repair.",
    "summary": "The certified-operators source is in non-ready state for more than 10 minutes."
  },
  "state": "pending",
  "activeAt": "2022-07-25T05:39:47.911111372Z",
  "value": "0e+00"
}
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.20:8443",
    "job": "catalog-operator-metrics",
    "name": "community-operators",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-98ccdfbfc-d9bqf",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the community-operators source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by community-operators source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=community-operators) to diagnose and repair.",
    "summary": "The community-operators source is in non-ready state for more than 10 minutes."
  },
  "state": "pending",
  "activeAt": "2022-07-25T05:39:47.911111372Z",
  "value": "0e+00"
}
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.20:8443",
    "job": "catalog-operator-metrics",
    "name": "redhat-marketplace",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-98ccdfbfc-d9bqf",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the redhat-marketplace source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by redhat-marketplace source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=redhat-marketplace) to diagnose and repair.",
    "summary": "The redhat-marketplace source is in non-ready state for more than 10 minutes."
  },
  "state": "pending",
  "activeAt": "2022-07-25T05:39:47.911111372Z",
  "value": "0e+00"
}

LGTM, verified.

Comment 11 errata-xmlrpc 2023-01-17 19:46:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.