Bug 1720068

Summary: The "completed" cluster_version metric should only be included when at least one version has been successfully deployed
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: Cluster Version OperatorAssignee: Clayton Coleman <ccoleman>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1.zCC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-04 09:01:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-06-13 05:42:42 UTC
While building the upgrade status dashboard it was obvious that we needed a simpler query to determine whether a cluster had "successfully" installed (which would allow us to filter out clusters that were never completely installed).

The cluster_version{type="completed"} metric was supposed to report that, but when there is no completed update in the ClusterVersion history it was included but with empty labels rather than excluded, as is the norm for prometheus series (as type="updating" and type="failure" do).

In order to make this consistent, we should only report type="completed" when at least one completed upgrade is recorded in the clusterversion status.history array (meaning that the cluster was completely deployed at least once).

We can then filter out clusters that are hung on incomplete IPI or UPI installs.

Targeted for 4.1.z because this helps us understand what we have installed.

Comment 3 errata-xmlrpc 2019-07-04 09:01:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1635