Bug 1948702 - unneeded CCO alert already covered by CVO
Summary: unneeded CCO alert already covered by CVO
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Credential Operator
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.7.z
Assignee: Joel Diaz
QA Contact: wang lin
URL:
Whiteboard:
Depends On: 1948701 1957424
Blocks: 1958959
TreeView+ depends on / blocked
 
Reported: 2021-04-12 18:47 UTC by Joel Diaz
Modified: 2021-05-19 15:16 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Removed functionality
Doc Text:
The cluster-version operator is responsible for reporting if the cloud-credential-operator's deployment is unhealthy; no need for the cloud-credential-operator to handle this directly (resulting in doubling reporting if there's an issue).
Clone Of: 1948701
Environment:
Last Closed: 2021-05-19 15:15:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cloud-credential-operator pull 324 0 None open Bug 1948702: [release-4.7] manifests/0000_90_cloud-credential-operator_04_alertrules: Drop CloudCredentialOperatorDown 2021-05-05 18:41:27 UTC
Red Hat Product Errata RHBA-2021:1550 0 None None None 2021-05-19 15:16:14 UTC

Description Joel Diaz 2021-04-12 18:47:13 UTC
+++ This bug was initially created as a clone of Bug #1948701 +++

Description of problem:
CVO already is responsible for alerting on whether its operands are unhealthy. No need for CCO to have its own alert.


Version-Release number of selected component (if applicable):
4.7

How reproducible:
100%

Steps to Reproduce:
1. Put CCO into an unhealthy state.

Actual results:
Witness CVO and CCO alerts reporting the same information.


Expected results:
Only need a single alert.


Additional info:

Comment 2 Akhil Rane 2021-04-30 04:16:24 UTC
PR to fix this is open and under review https://github.com/openshift/cloud-credential-operator/pull/324

Comment 3 Joel Diaz 2021-05-05 18:37:50 UTC
*** Bug 1957424 has been marked as a duplicate of this bug. ***

Comment 5 wang lin 2021-05-07 05:01:22 UTC
Verified on 4.7.0-0.nightly-2021-05-07-004616

1. Login to prometheus console, check CloudCredentialOperatorDown has remove from CloudCredentialOperator

2. Create an invalid cr request, check when cco down cvo will fire the alerts
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0  -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v1/alerts' | jq 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5740    0  5740    0     0   400k      0 --:--:-- --:--:-- --:--:--  400k
{
  "status": "success",
  "data": [
    {
      "labels": {
        "alertname": "CloudCredentialOperatorTargetNamespaceMissing",
        "condition": "MissingTargetNamespace",
        "container": "kube-rbac-proxy",
        "endpoint": "metrics",
        "instance": "10.129.0.69:8443",
        "job": "cco-metrics",
        "namespace": "openshift-cloud-credential-operator",
        "pod": "cloud-credential-operator-7fd7b8c7d5-8t5fv",
        "prometheus": "openshift-monitoring/k8s",
        "service": "cco-metrics",
        "severity": "warning"
      },
      "annotations": {
        "message": "CredentialsRequest(s) pointing to non-existent namespace"
      },
      "startsAt": "2021-05-07T04:33:42.851Z",
      "endsAt": "2021-05-07T05:02:12.851Z",
      "generatorURL": "https://prometheus-k8s-openshift-monitoring.apps.lwan47bug.qe.devcluster.openshift.com/graph?g0.expr=cco_credentials_requests_conditions%7Bcondition%3D%22MissingTargetNamespace%22%7D+%3E+0&g0.tab=1",
      "status": {
        "state": "active",
        "silencedBy": [],
        "inhibitedBy": []
      },
      "receivers": [
        "Default"
      ],
      "fingerprint": "06b742835ceb6c49"
    },
    {
      "labels": {
        "alertname": "ClusterOperatorDown",
        "endpoint": "metrics",
        "instance": "10.0.162.255:9099",
        "job": "cluster-version-operator",
        "name": "cloud-credential",
        "namespace": "openshift-cluster-version",
        "pod": "cluster-version-operator-84676c6b47-hp54f",
        "prometheus": "openshift-monitoring/k8s",
        "service": "cluster-version-operator",
        "severity": "critical",
        "version": "4.7.0-0.nightly-2021-05-07-004616"
      },
      "annotations": {
        "message": "Cluster operator cloud-credential has not been available for 10 minutes. Operator may be down or disabled, cluster will not be kept up to date and upgrades will not be possible."
      },
      "startsAt": "2021-05-07T04:36:59.213Z",
      "endsAt": "2021-05-07T05:01:59.213Z",
      "generatorURL": "https://prometheus-k8s-openshift-monitoring.apps.lwan47bug.qe.devcluster.openshift.com/graph?g0.expr=cluster_operator_up%7Bjob%3D%22cluster-version-operator%22%7D+%3D%3D+0&g0.tab=1",
      "status": {
        "state": "active",
        "silencedBy": [],
        "inhibitedBy": []
      },
      "receivers": [
        "Critical"
      ],
      "fingerprint": "bc22e2964c0ab173"
    },
    {
      "labels": {
        "alertname": "ClusterOperatorDegraded",
        "condition": "Degraded",
        "endpoint": "metrics",
        "instance": "10.0.162.255:9099",
        "job": "cluster-version-operator",
        "name": "cloud-credential",
        "namespace": "openshift-cluster-version",
        "pod": "cluster-version-operator-84676c6b47-hp54f",
        "prometheus": "openshift-monitoring/k8s",
        "reason": "CredentialsFailing",
        "service": "cluster-version-operator",
        "severity": "critical"
      },
      "annotations": {
        "message": "Cluster operator cloud-credential has been degraded for 10 minutes. Operator is degraded because CredentialsFailing and cluster upgrades will be unstable."
      },
      "startsAt": "2021-05-07T04:36:59.213Z",
      "endsAt": "2021-05-07T05:01:59.213Z",
      "generatorURL": "https://prometheus-k8s-openshift-monitoring.apps.lwan47bug.qe.devcluster.openshift.com/graph?g0.expr=cluster_operator_conditions%7Bcondition%3D%22Degraded%22%2Cjob%3D%22cluster-version-operator%22%7D+%3D%3D+1&g0.tab=1",
      "status": {
        "state": "active",
        "silencedBy": [],
        "inhibitedBy": []
      },
      "receivers": [
        "Critical"
      ],
      "fingerprint": "d0b00c0a6b1e0e75"
    },
}

Comment 8 errata-xmlrpc 2021-05-19 15:15:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1550


Note You need to log in before you can comment on or make changes to this bug.