1741786 – CVO's default ClusterVersion races cluster-bootstrap pushing the installer's ClusterVersion

Bug 1741786 - CVO's default ClusterVersion races cluster-bootstrap pushing the installer's ClusterVersion

Summary: CVO's default ClusterVersion races cluster-bootstrap pushing the installer's ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.2.0
Assignee:	W. Trevor King
QA Contact:	liujia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1708697
TreeView+	depends on / blocked

Reported:	2019-08-16 06:28 UTC by W. Trevor King
Modified:	2019-10-16 06:36 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1708697
Environment:
Last Closed:	2019-10-16 06:36:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-version-operator pull 238	0	None	closed	Bug 1741786: pkg/cvo: Drop ClusterVersion defaulting during bootstrap	2020-12-15 16:27:51 UTC
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:36:26 UTC

Description W. Trevor King 2019-08-16 06:28:03 UTC

+++ This bug was initially created as a clone of Bug #1708697 +++

# oc get clusterversion -o yaml                                                                                                                                                          [79/305]
apiVersion: v1                                                                                                                                                                                                       
items:                                                                                                                                                                                                               
- apiVersion: config.openshift.io/v1                                                                                                                                                                                 
  kind: ClusterVersion                                                                                                                                                                                               
  metadata:                                                                                                                                                                                                          
    creationTimestamp: "2019-05-15T12:53:14Z"                                                                                                                                                                        
    generation: 1                                                                                                                                                                                                    
    name: version                                                                                                                                                                                                    
    resourceVersion: "17387"                                                                                                                                                                                         
    selfLink: /apis/config.openshift.io/v1/clusterversions/version                                                                                                                                                   
    uid: 67098406-7710-11e9-89d0-0050569b5e80                                                                                                                                                                        
  spec:                                                                                                                                                                                                              
    channel: fast                                                                                                                                                                                                    
    clusterID: e30624c2-487e-4646-81e4-02b060dcc070                                                                                                                                                                  
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph                                                                                                                                                   
...

This 'channel: fast' is a sign that the cluster-version operator's default ClusterVersion won the race, in which case the installer's ClusterVersion was ignored.  We need to remove the CVO's ClusterVersion defaulting logic.

Comment 3 W. Trevor King 2019-08-21 22:22:59 UTC

PR landed in master (and I think we're still fast-forwarding release-4.2 to match?), so moving Target Release back to 4.2.0.

I'm not sure what verification looks like for this, since it's a 3.5% flake.  But maybe you could force the race by putting the installer-provided ClusterVersion where the CVO won't find it:

$ openshift-install create manifests
$ sed -i 's/name: version/name: get-lost/' manifests/cvo-overrides.yaml
$ openshift-install create cluster

Bootstrapping should fail, and the gathered logs should contain the new log line [1].

[1]: https://github.com/openshift/cluster-version-operator/pull/238/files#diff-85c31beb4341b4c52f892c0581bbb5d6R368

Comment 5 liujia 2019-09-10 08:43:46 UTC

Can not reproduce on both v4.1(the version in https://bugzilla.redhat.com/show_bug.cgi?id=1708697#c14) and old v4.2.

# oc get clusterversion -o json|jq ".items[0].spec"
{
  "channel": "stable-4.1",
  "clusterID": "30f898c3-95d5-424c-9752-04de4e3eb9b2",
  "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
}
# oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-rc.3   True        False         33s     Cluster version is 4.1.0-rc.3

# oc get clusterversion -o json|jq ".items[0].spec"
{
  "channel": "stable-4.2",
  "clusterID": "da1ffa7c-c0b5-4aa9-9cc2-ae6d6ee3fa0a",
  "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
}
# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-07-12-041904   True        False         2m40s   Cluster version is 4.2.0-0.nightly-2019-07-12-041904

So i will verify the fix through the way in comment3

Comment 6 liujia 2019-09-11 07:57:53 UTC

Version: 4.2.0-0.nightly-2019-09-10-074025

Before create ignition file, update cvo-manifests.yaml to force installer-provided ClusterVersion failed with following way.
$ openshift-install create manifests
$ sed -i 's/name: version/name: get-lost/' manifests/cvo-overrides.yaml

Bootstrap fail as expected.
INFO Waiting up to 30m0s for the Kubernetes API at https://api.jliu-6122.qe.devcluster.openshift.com:6443... 
INFO API v1.14.6+35c093a up                       
INFO Waiting up to 30m0s for bootstrapping to complete... 
INFO Use the following commands to gather logs from the cluster 
INFO openshift-install gather bootstrap --help    

Checked cvo log as expected.
...
I0911 01:21:02.541080       1 cvo.go:352] Started syncing cluster version "openshift-cluster-version/version" (2019-09-11 01:21:02.541075391 +0000 UTC m=+45.751008793)
I0911 01:21:02.541106       1 cvo.go:368] No ClusterVersion object and defaulting not enabled, waiting for one
...

And the normal installation on vsphere works well on above version.
# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-09-10-074025   True        False         2m19s   Cluster version is 4.2.0-0.nightly-2019-09-10-074025

Comment 7 errata-xmlrpc 2019-10-16 06:36:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.