Bug 1741645

Summary: Telemetry should include the ClusterVersion conditions with reasons
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Cluster Version OperatorAssignee: Abhinav Dahiya <adahiya>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: aos-bugs, jokerman
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1741661 (view as bug list) Environment:
Last Closed: 2019-10-16 06:36:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1741661    

Description W. Trevor King 2019-08-15 17:25:43 UTC
Bug 1717617 (4.2) and bug 1717619 (4.1.z) added Telemetry for ClusterOperator conditions.  We should add it for ClusterVersion conditions as well, so we can get things like UpdatePayloadClusterError reasons [1] out of Telemetry without requiring cluster access for 'oc get' or must-gather commands.

I expect we'll want to backport this to 4.1.z via a cloned bug, and the implementation should apply cleanly there, but this bug is just about 4.2.

[1]:  https://bugzilla.redhat.com/show_bug.cgi?id=1740838#c20

Comment 2 liujia 2019-08-19 07:29:45 UTC
Version: 4.2.0-0.nightly-2019-08-18-222019

According to pr236, checked "version" operator was added into cluster_operator_conditions metric too.
Element                                   Value
cluster_operator_conditions{condition="Available",endpoint="metrics",instance="10.0.172.10:9099",job="cluster-version-operator",name="version",namespace="openshift-cluster-version",pod="cluster-version-operator-956b48c68-xjf74",service="cluster-version-operator"} 1
cluster_operator_conditions{condition="Failing",endpoint="metrics",instance="10.0.172.10:9099",job="cluster-version-operator",name="version",namespace="openshift-cluster-version",pod="cluster-version-operator-956b48c68-xjf74",service="cluster-version-operator"}  0
cluster_operator_conditions{condition="Progressing",endpoint="metrics",instance="10.0.172.10:9099",job="cluster-version-operator",name="version",namespace="openshift-cluster-version",pod="cluster-version-operator-956b48c68-xjf74",service="cluster-version-operator"}  0
cluster_operator_conditions{condition="RetrievedUpdates",endpoint="metrics",instance="10.0.172.10:9099",job="cluster-version-operator",name="version",namespace="openshift-cluster-version",pod="cluster-version-operator-956b48c68-xjf74",reason="RemoteFailed",service="cluster-version-operator"}  0

All conditions of "version" comply with the clusterversion object.
# ./oc get clusterversion -o json|jq ".items[0].status.conditions"
[
  {
    "lastTransitionTime": "2019-08-19T05:48:38Z",
    "message": "Done applying 4.2.0-0.nightly-2019-08-18-222019",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2019-08-19T05:44:02Z",
    "status": "False",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2019-08-19T05:48:38Z",
    "message": "Cluster version is 4.2.0-0.nightly-2019-08-18-222019",
    "status": "False",
    "type": "Progressing"
  },
  {
    "lastTransitionTime": "2019-08-19T05:35:26Z",
    "message": "Unable to retrieve available updates: currently installed version 4.2.0-0.nightly-2019-08-18-222019 not found in the \"stable-4.2\" channel",
    "reason": "RemoteFailed",
    "status": "False",
    "type": "RetrievedUpdates"
  }
]

Comment 3 errata-xmlrpc 2019-10-16 06:36:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922