Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1717619

Summary:	Telemetry should include the condition reason on degraded operators
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Cluster Version Operator	Assignee:	Clayton Coleman <ccoleman>
Status:	CLOSED ERRATA	QA Contact:	liujia <jiajliu>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.1.z	CC:	aos-bugs, jokerman, mmccomas, wsun
Target Milestone:	---
Target Release:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-07-04 09:01:24 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-06-05 19:56:08 UTC

This bug was initially created as a copy of Bug #1717617

I am copying this bug because: it should be part of 4.1.z so that we can triage failures.

While assessing 4.1.0 GA degraded operators Adam suggested capturing the reason (which has bounded cardinality) on the telemetry metric for cluster_operator_conditions, which would

1. incentivize teams to add good reasons
2. allow a quick summarization of why the operator is degraded

Added in https://github.com/openshift/cluster-version-operator/pull/197, should be back ported to 4.1.1

Comment 2 liujia 2019-06-20 08:43:53 UTC

Version: 4.1.0-0.nightly-2019-06-19-220253

According to pr197, Checked from prometheus console that "reason" section was added into cluster_operator_conditions metric now.

cluster_operator_conditions{condition="Available",endpoint="metrics",instance="10.0.131.249:9099",job="cluster-version-operator",name="image-registry",namespace="openshift-cluster-version",pod="cluster-version-operator-94456444f-clw9n",reason="Ready",service="cluster-version-operator"}
...
cluster_operator_conditions{condition="Degraded",endpoint="metrics",instance="10.0.131.249:9099",job="cluster-version-operator",name="cloud-credential",namespace="openshift-cluster-version",pod="cluster-version-operator-94456444f-clw9n",reason="NoCredentialsFailing",service="cluster-version-operator"}
...
cluster_operator_conditions{condition="Progressing",endpoint="metrics",instance="10.0.131.249:9099",job="cluster-version-operator",name="cloud-credential",namespace="openshift-cluster-version",pod="cluster-version-operator-94456444f-clw9n",reason="ReconcilingComplete",service="cluster-version-operator"}
...
cluster_operator_conditions{condition="Upgradeable",endpoint="metrics",instance="10.0.131.249:9099",job="cluster-version-operator",name="kube-controller-manager",namespace="openshift-cluster-version",pod="cluster-version-operator-94456444f-clw9n",reason="AsExpected",service="cluster-version-operator"}

Verify the bug according to my understanding, please feel free to re-open it if it's not expected.

Comment 4 errata-xmlrpc 2019-07-04 09:01:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1635