Bug 1948702

Summary: unneeded CCO alert already covered by CVO
Product: OpenShift Container Platform Reporter: Joel Diaz <jdiaz>
Component: Cloud Credential OperatorAssignee: Joel Diaz <jdiaz>
Status: CLOSED ERRATA QA Contact: wang lin <lwan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.7CC: arane, dgoodwin, lwan
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Removed functionality
Doc Text:
The cluster-version operator is responsible for reporting if the cloud-credential-operator's deployment is unhealthy; no need for the cloud-credential-operator to handle this directly (resulting in doubling reporting if there's an issue).
Story Points: ---
Clone Of: 1948701 Environment:
Last Closed: 2021-05-19 15:15:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1948701, 1957424    
Bug Blocks: 1958959    

Description Joel Diaz 2021-04-12 18:47:13 UTC
+++ This bug was initially created as a clone of Bug #1948701 +++

Description of problem:
CVO already is responsible for alerting on whether its operands are unhealthy. No need for CCO to have its own alert.


Version-Release number of selected component (if applicable):
4.7

How reproducible:
100%

Steps to Reproduce:
1. Put CCO into an unhealthy state.

Actual results:
Witness CVO and CCO alerts reporting the same information.


Expected results:
Only need a single alert.


Additional info:

Comment 2 Akhil Rane 2021-04-30 04:16:24 UTC
PR to fix this is open and under review https://github.com/openshift/cloud-credential-operator/pull/324

Comment 3 Joel Diaz 2021-05-05 18:37:50 UTC
*** Bug 1957424 has been marked as a duplicate of this bug. ***

Comment 5 wang lin 2021-05-07 05:01:22 UTC
Verified on 4.7.0-0.nightly-2021-05-07-004616

1. Login to prometheus console, check CloudCredentialOperatorDown has remove from CloudCredentialOperator

2. Create an invalid cr request, check when cco down cvo will fire the alerts
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0  -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v1/alerts' | jq 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5740    0  5740    0     0   400k      0 --:--:-- --:--:-- --:--:--  400k
{
  "status": "success",
  "data": [
    {
      "labels": {
        "alertname": "CloudCredentialOperatorTargetNamespaceMissing",
        "condition": "MissingTargetNamespace",
        "container": "kube-rbac-proxy",
        "endpoint": "metrics",
        "instance": "10.129.0.69:8443",
        "job": "cco-metrics",
        "namespace": "openshift-cloud-credential-operator",
        "pod": "cloud-credential-operator-7fd7b8c7d5-8t5fv",
        "prometheus": "openshift-monitoring/k8s",
        "service": "cco-metrics",
        "severity": "warning"
      },
      "annotations": {
        "message": "CredentialsRequest(s) pointing to non-existent namespace"
      },
      "startsAt": "2021-05-07T04:33:42.851Z",
      "endsAt": "2021-05-07T05:02:12.851Z",
      "generatorURL": "https://prometheus-k8s-openshift-monitoring.apps.lwan47bug.qe.devcluster.openshift.com/graph?g0.expr=cco_credentials_requests_conditions%7Bcondition%3D%22MissingTargetNamespace%22%7D+%3E+0&g0.tab=1",
      "status": {
        "state": "active",
        "silencedBy": [],
        "inhibitedBy": []
      },
      "receivers": [
        "Default"
      ],
      "fingerprint": "06b742835ceb6c49"
    },
    {
      "labels": {
        "alertname": "ClusterOperatorDown",
        "endpoint": "metrics",
        "instance": "10.0.162.255:9099",
        "job": "cluster-version-operator",
        "name": "cloud-credential",
        "namespace": "openshift-cluster-version",
        "pod": "cluster-version-operator-84676c6b47-hp54f",
        "prometheus": "openshift-monitoring/k8s",
        "service": "cluster-version-operator",
        "severity": "critical",
        "version": "4.7.0-0.nightly-2021-05-07-004616"
      },
      "annotations": {
        "message": "Cluster operator cloud-credential has not been available for 10 minutes. Operator may be down or disabled, cluster will not be kept up to date and upgrades will not be possible."
      },
      "startsAt": "2021-05-07T04:36:59.213Z",
      "endsAt": "2021-05-07T05:01:59.213Z",
      "generatorURL": "https://prometheus-k8s-openshift-monitoring.apps.lwan47bug.qe.devcluster.openshift.com/graph?g0.expr=cluster_operator_up%7Bjob%3D%22cluster-version-operator%22%7D+%3D%3D+0&g0.tab=1",
      "status": {
        "state": "active",
        "silencedBy": [],
        "inhibitedBy": []
      },
      "receivers": [
        "Critical"
      ],
      "fingerprint": "bc22e2964c0ab173"
    },
    {
      "labels": {
        "alertname": "ClusterOperatorDegraded",
        "condition": "Degraded",
        "endpoint": "metrics",
        "instance": "10.0.162.255:9099",
        "job": "cluster-version-operator",
        "name": "cloud-credential",
        "namespace": "openshift-cluster-version",
        "pod": "cluster-version-operator-84676c6b47-hp54f",
        "prometheus": "openshift-monitoring/k8s",
        "reason": "CredentialsFailing",
        "service": "cluster-version-operator",
        "severity": "critical"
      },
      "annotations": {
        "message": "Cluster operator cloud-credential has been degraded for 10 minutes. Operator is degraded because CredentialsFailing and cluster upgrades will be unstable."
      },
      "startsAt": "2021-05-07T04:36:59.213Z",
      "endsAt": "2021-05-07T05:01:59.213Z",
      "generatorURL": "https://prometheus-k8s-openshift-monitoring.apps.lwan47bug.qe.devcluster.openshift.com/graph?g0.expr=cluster_operator_conditions%7Bcondition%3D%22Degraded%22%2Cjob%3D%22cluster-version-operator%22%7D+%3D%3D+1&g0.tab=1",
      "status": {
        "state": "active",
        "silencedBy": [],
        "inhibitedBy": []
      },
      "receivers": [
        "Critical"
      ],
      "fingerprint": "d0b00c0a6b1e0e75"
    },
}

Comment 8 errata-xmlrpc 2021-05-19 15:15:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1550