Bug 2010375

Summary: OpenShift Alerting Rules Style-Guide Compliance
Product: OpenShift Container Platform Reporter: Brad Ison <brad.ison>
Component: OLMAssignee: Anik <anbhatta>
OLM sub component: OLM QA Contact: xzha
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: anbhatta, ankithom, cchantse, spasquie
Version: 4.10   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:46:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brad Ison 2021-10-04 14:06:25 UTC
Hello,

The OpenShift Monitoring Team has published a set guidelines for
writing alerting rules in OpenShift, including a basic style guide.
You can find these here:

  https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md
  https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide

A subset of these are now being enforced in OpenShift End-to-End
tests [1], with temporary exceptions for existing non-compliant rules.

This component was found to have the following issues:

* Alerts without summary and/or description annotations:

  - CertifiedOperatorsCatalogError
  - CommunityOperatorsCatalogError
  - RedhatMarketplaceCatalogError
  - RedhatOperatorsCatalogError

Alerts MUST include summary and description annotations.

Think of summary as the first line of a commit message, or an email
subject line. It should be brief but informative. The description is
the longer, more detailed explanation of the alert.

The enhancement document linked above has examples of alerts with
these annotations.

Thank you!

Repo: operator-framework/operator-marketplace

[1]: https://github.com/openshift/origin/commit/097e7a6

Comment 2 Simon Pasquier 2022-06-03 14:36:33 UTC
Any progress on this issue? The monitoring team could help if needed.

Comment 3 xzha 2022-06-13 07:06:14 UTC
verify:

1) install cluster with this PR
zhaoxia@xzha-mac openshift-tests-private % oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.ci.test-2022-06-13-054554-ci-ln-zn13fyk-latest   True        False         39m     Cluster version is 4.11.0-0.ci.test-2022-06-13-054554-ci-ln-zn13fyk-latest

zhaoxia@xzha-mac openshift-tests-private % oc get catsrc
NAME                  DISPLAY               TYPE   PUBLISHER   AGE
certified-operators   Certified Operators   grpc   Red Hat     61m
community-operators   Community Operators   grpc   Red Hat     61m
redhat-marketplace    Red Hat Marketplace   grpc   Red Hat     61m
redhat-operators      Red Hat Operators     grpc   Red Hat     61m

2) make catsrc certified-operators pod is pending
oc patch catsrc certified-operators  -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
zhaoxia@xzha-mac openshift-tests-private % oc get pod   
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-7sns9               0/1     Pending   0          8s
community-operators-8kqfv               1/1     Running   0          40m
marketplace-operator-85d9b67789-g7tjw   1/1     Running   0          43m
redhat-marketplace-bcc47                1/1     Running   0          40m
redhat-operators-w6cbn                  1/1     Running   0          40m

3) check alert
zhaoxia@xzha-mac openshift-tests-private % curl -k -H "Authorization: Bearer $(oc create token prometheus-k8s -n openshift-monitoring)" https://$(oc get route prometheus-k8s -n openshift-monitoring -o=jsonpath='{.spec.host}')/api/v1/alerts| jq -r '.data.alerts[] | select (.labels.alertname == "OperatorHubSourceError")'
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.25:8443",
    "job": "catalog-operator-metrics",
    "name": "certified-operators",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-6468cdd79f-4gnqb",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the certified-operators source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by certified-operators source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=certified-operators) to diagnose and repair.",
    "summary": "The certified-operators source is in non-ready state for more than 10 minutes."
  },
  "state": "firing",
  "activeAt": "2022-06-13T06:38:33.120882761Z",
  "value": "0e+00"
}

4) stop other catalogs
oc patch catsrc community-operators  -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
oc patch catsrc redhat-marketplace   -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
oc patch catsrc redhat-operators  -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge

5) check alert
zhaoxia@xzha-mac openshift-tests-private % curl -k -H "Authorization: Bearer $(oc create token prometheus-k8s -n openshift-monitoring)" https://$(oc get route prometheus-k8s -n openshift-monitoring -o=jsonpath='{.spec.host}')/api/v1/alerts| jq -r '.data.alerts[] | select (.labels.alertname == "OperatorHubSourceError")'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10566    0 10566    0     0   7908      0 --:--:--  0:00:01 --:--:--  7998
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.25:8443",
    "job": "catalog-operator-metrics",
    "name": "certified-operators",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-6468cdd79f-4gnqb",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the certified-operators source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by certified-operators source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=certified-operators) to diagnose and repair.",
    "summary": "The certified-operators source is in non-ready state for more than 10 minutes."
  },
  "state": "firing",
  "activeAt": "2022-06-13T06:38:33.120882761Z",
  "value": "0e+00"
}
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.25:8443",
    "job": "catalog-operator-metrics",
    "name": "community-operators",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-6468cdd79f-4gnqb",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the community-operators source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by community-operators source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=community-operators) to diagnose and repair.",
    "summary": "The community-operators source is in non-ready state for more than 10 minutes."
  },
  "state": "firing",
  "activeAt": "2022-06-13T06:51:33.120882761Z",
  "value": "0e+00"
}
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.25:8443",
    "job": "catalog-operator-metrics",
    "name": "redhat-marketplace",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-6468cdd79f-4gnqb",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the redhat-marketplace source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by redhat-marketplace source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=redhat-marketplace) to diagnose and repair.",
    "summary": "The redhat-marketplace source is in non-ready state for more than 10 minutes."
  },
  "state": "pending",
  "activeAt": "2022-06-13T06:53:33.120882761Z",
  "value": "0e+00"
}
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.25:8443",
    "job": "catalog-operator-metrics",
    "name": "redhat-operators",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-6468cdd79f-4gnqb",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the redhat-operators source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by redhat-operators source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=redhat-operators) to diagnose and repair.",
    "summary": "The redhat-operators source is in non-ready state for more than 10 minutes."
  },
  "state": "pending",
  "activeAt": "2022-06-13T06:54:03.120882761Z",
  "value": "0e+00"
}


LGTM, verified.

Comment 7 xzha 2022-07-25 05:42:23 UTC
verify:
zhaoxia@xzha-mac ~ % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-07-24-180529   True        False         38m     Cluster version is 4.12.0-0.nightly-2022-07-24-180529
zhaoxia@xzha-mac ~ % oc get catsrc -A
NAMESPACE               NAME                  DISPLAY               TYPE   PUBLISHER   AGE
openshift-marketplace   certified-operators   Certified Operators   grpc   Red Hat     56m
openshift-marketplace   community-operators   Community Operators   grpc   Red Hat     56m
openshift-marketplace   redhat-marketplace    Red Hat Marketplace   grpc   Red Hat     56m
openshift-marketplace   redhat-operators      Red Hat Operators     grpc   Red Hat     56m

1) make catsrc pod is pending
zhaoxia@xzha-mac ~ % oc patch catsrc certified-operators  -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
catalogsource.operators.coreos.com/certified-operators patched
zhaoxia@xzha-mac ~ % oc patch catsrc community-operators -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
catalogsource.operators.coreos.com/community-operators patched
zhaoxia@xzha-mac ~ % oc patch catsrc redhat-marketplace -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
catalogsource.operators.coreos.com/redhat-marketplace patched
zhaoxia@xzha-mac ~ % oc patch catsrc  redhat-operators  -p='{"spec":{"grpcPodConfig":{"nodeSelector":{"fake43642":"fake"}}}}' --type=merge
catalogsource.operators.coreos.com/redhat-operators patched

zhaoxia@xzha-mac ~ % oc get pod
NAME                                   READY   STATUS    RESTARTS      AGE
certified-operators-szdsx              0/1     Pending   0             2m28s
community-operators-q5hn5              0/1     Pending   0             2m18s
marketplace-operator-bbbc9755c-lpkpr   1/1     Running   5 (44m ago)   62m
redhat-marketplace-dhzx4               0/1     Pending   0             2m7s
redhat-operators-lhdw7                 0/1     Pending   0             117s

2) check alert
zhaoxia@xzha-mac ~ % curl -k -H "Authorization: Bearer $(oc create token prometheus-k8s -n openshift-monitoring)" https://$(oc get route prometheus-k8s -n openshift-monitoring -o=jsonpath='{.spec.host}')/api/v1/alerts| jq -r '.data.alerts[] | select (.labels.alertname == "OperatorHubSourceError")'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4305    0  4305    0     0   4527      0 --:--:-- --:--:-- --:--:--  4565
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.20:8443",
    "job": "catalog-operator-metrics",
    "name": "certified-operators",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-98ccdfbfc-d9bqf",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the certified-operators source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by certified-operators source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=certified-operators) to diagnose and repair.",
    "summary": "The certified-operators source is in non-ready state for more than 10 minutes."
  },
  "state": "pending",
  "activeAt": "2022-07-25T05:39:47.911111372Z",
  "value": "0e+00"
}
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.20:8443",
    "job": "catalog-operator-metrics",
    "name": "community-operators",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-98ccdfbfc-d9bqf",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the community-operators source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by community-operators source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=community-operators) to diagnose and repair.",
    "summary": "The community-operators source is in non-ready state for more than 10 minutes."
  },
  "state": "pending",
  "activeAt": "2022-07-25T05:39:47.911111372Z",
  "value": "0e+00"
}
{
  "labels": {
    "alertname": "OperatorHubSourceError",
    "container": "catalog-operator",
    "endpoint": "https-metrics",
    "exported_namespace": "openshift-marketplace",
    "instance": "10.128.0.20:8443",
    "job": "catalog-operator-metrics",
    "name": "redhat-marketplace",
    "namespace": "openshift-operator-lifecycle-manager",
    "pod": "catalog-operator-98ccdfbfc-d9bqf",
    "service": "catalog-operator-metrics",
    "severity": "warning"
  },
  "annotations": {
    "description": "Operators shipped via the redhat-marketplace source are not available for installation until the issue is fixed. Operators already installed from this source will not receive updates until issue is fixed. Inspect the status of the pod owned by redhat-marketplace source in the openshift-marketplace namespace (oc -n openshift-marketplace get pods -l olm.catalogSource=redhat-marketplace) to diagnose and repair.",
    "summary": "The redhat-marketplace source is in non-ready state for more than 10 minutes."
  },
  "state": "pending",
  "activeAt": "2022-07-25T05:39:47.911111372Z",
  "value": "0e+00"
}

LGTM, verified.

Comment 11 errata-xmlrpc 2023-01-17 19:46:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399