Bug 1723945

Summary: Unable to identify age of cluster in relation to current version via PromQL
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: Cluster Version OperatorAssignee: Clayton Coleman <ccoleman>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: adahiya, aos-bugs, eparis, jokerman, juzhao, mmccomas
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1724784 (view as bug list) Environment:
Last Closed: 2019-10-16 06:32:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1724784    

Description Clayton Coleman 2019-06-25 19:58:18 UTC
While building dashboards for understanding the state of upgrades, it is currently not possible to build the query:

"how old are the clusters of a given current version"

without having the ability to get the current version but with the timestamp of the initial install.

Repurpose the "cluster_version{type="cluster"}" series to:

1. have version and image set to the same value as type="current" (what the operator is currently trying to apply)
2. have from_version set to the same version as type="initial"
3. keep the date as the initial
4. if the cluster has never completed a sync, set from_version empty (so we can exclude clusters that have never completed)

This also now allows the query "show all successfully installed clusters by their version" and means most people should use type="cluster" instead of type="current" (since the current date is not that useful)

Because this loses queryability of "which images were clusters installed with", add a new "cluster_version{type="initial"}" series which has

1. version and image set to the oldest entry in the history.
2. date set to the install time
3. from_version set empty

Hopefully this is the last major change to the cluster_version metric, given that our current queries have so far been able to triage the state of upgrades across give versions.

Verification will be manual by querying telemetry.

Comment 2 Clayton Coleman 2019-06-27 19:14:50 UTC
Verified manually

max_over_time(cluster_version{type="cluster",from_version!=""}[2d])

{_id="b020b111-fe78-4353-acb0-a35da607ca01",endpoint="metrics",from_version="0.0.1-2019-06-26-201348",image="registry.svc.ci.openshift.org/ci-op-74jsslv3/release@sha256:9467af6b15821a223253d5f34cae519d9781416f23009475f4acb274549c6171",instance="10.0.149.248:9099",job="cluster-version-operator",namespace="openshift-cluster-version",pod="cluster-version-operator-c45df89d-tnlgp",prometheus="openshift-monitoring/k8s",service="cluster-version-operator",type="cluster",version="0.0.1-2019-06-26-201348"}	1561580894
{_id="b57f1806-62ae-4730-997b-227e54944a64",endpoint="metrics",from_version="0.0.1-2019-06-25-200446",image="registry.svc.ci.openshift.org/ci-op-7zdb1my6/release@sha256:dea359a2c87418917ce49d1be3ec3f82531d3a423c0fa1f94d471c8632bf3abc",instance="10.0.128.253:9099",job="cluster-version-operator",namespace="openshift-cluster-version",pod="cluster-version-operator-576988ccd4-brbxq",prometheus="openshift-monitoring/k8s",service="cluster-version-operator",type="cluster",version="0.0.1-2019-06-25-200446"}

Comment 3 Eric Paris 2019-07-01 14:56:14 UTC
Under no circumstances can an engineer EVER verify their own bugzilla. Do not do this again.

Comment 4 Clayton Coleman 2019-07-01 18:56:54 UTC
If you want to take over verification of this, please do.

Comment 6 errata-xmlrpc 2019-10-16 06:32:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922