Bug 1723945 - Unable to identify age of cluster in relation to current version via PromQL
Summary: Unable to identify age of cluster in relation to current version via PromQL
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2.0
Assignee: Clayton Coleman
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1724784
TreeView+ depends on / blocked
 
Reported: 2019-06-25 19:58 UTC by Clayton Coleman
Modified: 2019-10-16 06:32 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1724784 (view as bug list)
Environment:
Last Closed: 2019-10-16 06:32:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:32:36 UTC

Description Clayton Coleman 2019-06-25 19:58:18 UTC
While building dashboards for understanding the state of upgrades, it is currently not possible to build the query:

"how old are the clusters of a given current version"

without having the ability to get the current version but with the timestamp of the initial install.

Repurpose the "cluster_version{type="cluster"}" series to:

1. have version and image set to the same value as type="current" (what the operator is currently trying to apply)
2. have from_version set to the same version as type="initial"
3. keep the date as the initial
4. if the cluster has never completed a sync, set from_version empty (so we can exclude clusters that have never completed)

This also now allows the query "show all successfully installed clusters by their version" and means most people should use type="cluster" instead of type="current" (since the current date is not that useful)

Because this loses queryability of "which images were clusters installed with", add a new "cluster_version{type="initial"}" series which has

1. version and image set to the oldest entry in the history.
2. date set to the install time
3. from_version set empty

Hopefully this is the last major change to the cluster_version metric, given that our current queries have so far been able to triage the state of upgrades across give versions.

Verification will be manual by querying telemetry.

Comment 2 Clayton Coleman 2019-06-27 19:14:50 UTC
Verified manually

max_over_time(cluster_version{type="cluster",from_version!=""}[2d])

{_id="b020b111-fe78-4353-acb0-a35da607ca01",endpoint="metrics",from_version="0.0.1-2019-06-26-201348",image="registry.svc.ci.openshift.org/ci-op-74jsslv3/release@sha256:9467af6b15821a223253d5f34cae519d9781416f23009475f4acb274549c6171",instance="10.0.149.248:9099",job="cluster-version-operator",namespace="openshift-cluster-version",pod="cluster-version-operator-c45df89d-tnlgp",prometheus="openshift-monitoring/k8s",service="cluster-version-operator",type="cluster",version="0.0.1-2019-06-26-201348"}	1561580894
{_id="b57f1806-62ae-4730-997b-227e54944a64",endpoint="metrics",from_version="0.0.1-2019-06-25-200446",image="registry.svc.ci.openshift.org/ci-op-7zdb1my6/release@sha256:dea359a2c87418917ce49d1be3ec3f82531d3a423c0fa1f94d471c8632bf3abc",instance="10.0.128.253:9099",job="cluster-version-operator",namespace="openshift-cluster-version",pod="cluster-version-operator-576988ccd4-brbxq",prometheus="openshift-monitoring/k8s",service="cluster-version-operator",type="cluster",version="0.0.1-2019-06-25-200446"}

Comment 3 Eric Paris 2019-07-01 14:56:14 UTC
Under no circumstances can an engineer EVER verify their own bugzilla. Do not do this again.

Comment 4 Clayton Coleman 2019-07-01 18:56:54 UTC
If you want to take over verification of this, please do.

Comment 6 errata-xmlrpc 2019-10-16 06:32:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.