1948702 – unneeded CCO alert already covered by CVO

Bug 1948702 - unneeded CCO alert already covered by CVO

Summary: unneeded CCO alert already covered by CVO

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Credential Operator
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.7.z
Assignee:	Joel Diaz
QA Contact:	wang lin
Docs Contact:
URL:
Whiteboard:
Depends On:	1948701 1957424
Blocks:	1958959
TreeView+	depends on / blocked

Reported:	2021-04-12 18:47 UTC by Joel Diaz
Modified:	2021-05-19 15:16 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Removed functionality
Doc Text:	The cluster-version operator is responsible for reporting if the cloud-credential-operator's deployment is unhealthy; no need for the cloud-credential-operator to handle this directly (resulting in doubling reporting if there's an issue).
Clone Of:	1948701
Environment:
Last Closed:	2021-05-19 15:15:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cloud-credential-operator pull 324	0	None	open	Bug 1948702: [release-4.7] manifests/0000_90_cloud-credential-operator_04_alertrules: Drop CloudCredentialOperatorDown	2021-05-05 18:41:27 UTC
Red Hat Product Errata	RHBA-2021:1550	0	None	None	None	2021-05-19 15:16:14 UTC

Description Joel Diaz 2021-04-12 18:47:13 UTC

+++ This bug was initially created as a clone of Bug #1948701 +++

Description of problem:
CVO already is responsible for alerting on whether its operands are unhealthy. No need for CCO to have its own alert.


Version-Release number of selected component (if applicable):
4.7

How reproducible:
100%

Steps to Reproduce:
1. Put CCO into an unhealthy state.

Actual results:
Witness CVO and CCO alerts reporting the same information.


Expected results:
Only need a single alert.


Additional info:

Comment 2 Akhil Rane 2021-04-30 04:16:24 UTC

PR to fix this is open and under review https://github.com/openshift/cloud-credential-operator/pull/324

Comment 3 Joel Diaz 2021-05-05 18:37:50 UTC

*** Bug 1957424 has been marked as a duplicate of this bug. ***

Comment 5 wang lin 2021-05-07 05:01:22 UTC

Verified on 4.7.0-0.nightly-2021-05-07-004616

1. Login to prometheus console, check CloudCredentialOperatorDown has remove from CloudCredentialOperator

2. Create an invalid cr request, check when cco down cvo will fire the alerts
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0  -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v1/alerts' | jq 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5740    0  5740    0     0   400k      0 --:--:-- --:--:-- --:--:--  400k
{
  "status": "success",
  "data": [
    {
      "labels": {
        "alertname": "CloudCredentialOperatorTargetNamespaceMissing",
        "condition": "MissingTargetNamespace",
        "container": "kube-rbac-proxy",
        "endpoint": "metrics",
        "instance": "10.129.0.69:8443",
        "job": "cco-metrics",
        "namespace": "openshift-cloud-credential-operator",
        "pod": "cloud-credential-operator-7fd7b8c7d5-8t5fv",
        "prometheus": "openshift-monitoring/k8s",
        "service": "cco-metrics",
        "severity": "warning"
      },
      "annotations": {
        "message": "CredentialsRequest(s) pointing to non-existent namespace"
      },
      "startsAt": "2021-05-07T04:33:42.851Z",
      "endsAt": "2021-05-07T05:02:12.851Z",
      "generatorURL": "https://prometheus-k8s-openshift-monitoring.apps.lwan47bug.qe.devcluster.openshift.com/graph?g0.expr=cco_credentials_requests_conditions%7Bcondition%3D%22MissingTargetNamespace%22%7D+%3E+0&g0.tab=1",
      "status": {
        "state": "active",
        "silencedBy": [],
        "inhibitedBy": []
      },
      "receivers": [
        "Default"
      ],
      "fingerprint": "06b742835ceb6c49"
    },
    {
      "labels": {
        "alertname": "ClusterOperatorDown",
        "endpoint": "metrics",
        "instance": "10.0.162.255:9099",
        "job": "cluster-version-operator",
        "name": "cloud-credential",
        "namespace": "openshift-cluster-version",
        "pod": "cluster-version-operator-84676c6b47-hp54f",
        "prometheus": "openshift-monitoring/k8s",
        "service": "cluster-version-operator",
        "severity": "critical",
        "version": "4.7.0-0.nightly-2021-05-07-004616"
      },
      "annotations": {
        "message": "Cluster operator cloud-credential has not been available for 10 minutes. Operator may be down or disabled, cluster will not be kept up to date and upgrades will not be possible."
      },
      "startsAt": "2021-05-07T04:36:59.213Z",
      "endsAt": "2021-05-07T05:01:59.213Z",
      "generatorURL": "https://prometheus-k8s-openshift-monitoring.apps.lwan47bug.qe.devcluster.openshift.com/graph?g0.expr=cluster_operator_up%7Bjob%3D%22cluster-version-operator%22%7D+%3D%3D+0&g0.tab=1",
      "status": {
        "state": "active",
        "silencedBy": [],
        "inhibitedBy": []
      },
      "receivers": [
        "Critical"
      ],
      "fingerprint": "bc22e2964c0ab173"
    },
    {
      "labels": {
        "alertname": "ClusterOperatorDegraded",
        "condition": "Degraded",
        "endpoint": "metrics",
        "instance": "10.0.162.255:9099",
        "job": "cluster-version-operator",
        "name": "cloud-credential",
        "namespace": "openshift-cluster-version",
        "pod": "cluster-version-operator-84676c6b47-hp54f",
        "prometheus": "openshift-monitoring/k8s",
        "reason": "CredentialsFailing",
        "service": "cluster-version-operator",
        "severity": "critical"
      },
      "annotations": {
        "message": "Cluster operator cloud-credential has been degraded for 10 minutes. Operator is degraded because CredentialsFailing and cluster upgrades will be unstable."
      },
      "startsAt": "2021-05-07T04:36:59.213Z",
      "endsAt": "2021-05-07T05:01:59.213Z",
      "generatorURL": "https://prometheus-k8s-openshift-monitoring.apps.lwan47bug.qe.devcluster.openshift.com/graph?g0.expr=cluster_operator_conditions%7Bcondition%3D%22Degraded%22%2Cjob%3D%22cluster-version-operator%22%7D+%3D%3D+1&g0.tab=1",
      "status": {
        "state": "active",
        "silencedBy": [],
        "inhibitedBy": []
      },
      "receivers": [
        "Critical"
      ],
      "fingerprint": "d0b00c0a6b1e0e75"
    },
}

Comment 8 errata-xmlrpc 2021-05-19 15:15:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1550

Note You need to log in before you can comment on or make changes to this bug.