Bug 1720068 - The "completed" cluster_version metric should only be included when at least one version has been successfully deployed
Summary: The "completed" cluster_version metric should only be included when at least ...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.1.z
Assignee: Clayton Coleman
QA Contact: Junqi Zhao
Depends On:
TreeView+ depends on / blocked
Reported: 2019-06-13 05:42 UTC by Clayton Coleman
Modified: 2019-07-04 09:01 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-07-04 09:01:40 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:1635 None None None 2019-07-04 09:01:47 UTC

Description Clayton Coleman 2019-06-13 05:42:42 UTC
While building the upgrade status dashboard it was obvious that we needed a simpler query to determine whether a cluster had "successfully" installed (which would allow us to filter out clusters that were never completely installed).

The cluster_version{type="completed"} metric was supposed to report that, but when there is no completed update in the ClusterVersion history it was included but with empty labels rather than excluded, as is the norm for prometheus series (as type="updating" and type="failure" do).

In order to make this consistent, we should only report type="completed" when at least one completed upgrade is recorded in the clusterversion status.history array (meaning that the cluster was completely deployed at least once).

We can then filter out clusters that are hung on incomplete IPI or UPI installs.

Targeted for 4.1.z because this helps us understand what we have installed.

Comment 3 errata-xmlrpc 2019-07-04 09:01:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.