Bug 1741786 - CVO's default ClusterVersion races cluster-bootstrap pushing the installer's ClusterVersion
Summary: CVO's default ClusterVersion races cluster-bootstrap pushing the installer's ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.0
Assignee: W. Trevor King
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks: 1708697
TreeView+ depends on / blocked
 
Reported: 2019-08-16 06:28 UTC by W. Trevor King
Modified: 2019-10-16 06:36 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1708697
Environment:
Last Closed: 2019-10-16 06:36:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 238 0 None closed Bug 1741786: pkg/cvo: Drop ClusterVersion defaulting during bootstrap 2020-12-15 16:27:51 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:36:26 UTC

Description W. Trevor King 2019-08-16 06:28:03 UTC
+++ This bug was initially created as a clone of Bug #1708697 +++

# oc get clusterversion -o yaml                                                                                                                                                          [79/305]
apiVersion: v1                                                                                                                                                                                                       
items:                                                                                                                                                                                                               
- apiVersion: config.openshift.io/v1                                                                                                                                                                                 
  kind: ClusterVersion                                                                                                                                                                                               
  metadata:                                                                                                                                                                                                          
    creationTimestamp: "2019-05-15T12:53:14Z"                                                                                                                                                                        
    generation: 1                                                                                                                                                                                                    
    name: version                                                                                                                                                                                                    
    resourceVersion: "17387"                                                                                                                                                                                         
    selfLink: /apis/config.openshift.io/v1/clusterversions/version                                                                                                                                                   
    uid: 67098406-7710-11e9-89d0-0050569b5e80                                                                                                                                                                        
  spec:                                                                                                                                                                                                              
    channel: fast                                                                                                                                                                                                    
    clusterID: e30624c2-487e-4646-81e4-02b060dcc070                                                                                                                                                                  
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph                                                                                                                                                   
...

This 'channel: fast' is a sign that the cluster-version operator's default ClusterVersion won the race, in which case the installer's ClusterVersion was ignored.  We need to remove the CVO's ClusterVersion defaulting logic.

Comment 3 W. Trevor King 2019-08-21 22:22:59 UTC
PR landed in master (and I think we're still fast-forwarding release-4.2 to match?), so moving Target Release back to 4.2.0.

I'm not sure what verification looks like for this, since it's a 3.5% flake.  But maybe you could force the race by putting the installer-provided ClusterVersion where the CVO won't find it:

$ openshift-install create manifests
$ sed -i 's/name: version/name: get-lost/' manifests/cvo-overrides.yaml
$ openshift-install create cluster

Bootstrapping should fail, and the gathered logs should contain the new log line [1].

[1]: https://github.com/openshift/cluster-version-operator/pull/238/files#diff-85c31beb4341b4c52f892c0581bbb5d6R368

Comment 5 liujia 2019-09-10 08:43:46 UTC
Can not reproduce on both v4.1(the version in https://bugzilla.redhat.com/show_bug.cgi?id=1708697#c14) and old v4.2.

# oc get clusterversion -o json|jq ".items[0].spec"
{
  "channel": "stable-4.1",
  "clusterID": "30f898c3-95d5-424c-9752-04de4e3eb9b2",
  "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
}
# oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-rc.3   True        False         33s     Cluster version is 4.1.0-rc.3

# oc get clusterversion -o json|jq ".items[0].spec"
{
  "channel": "stable-4.2",
  "clusterID": "da1ffa7c-c0b5-4aa9-9cc2-ae6d6ee3fa0a",
  "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
}
# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-07-12-041904   True        False         2m40s   Cluster version is 4.2.0-0.nightly-2019-07-12-041904

So i will verify the fix through the way in comment3

Comment 6 liujia 2019-09-11 07:57:53 UTC
Version: 4.2.0-0.nightly-2019-09-10-074025

Before create ignition file, update cvo-manifests.yaml to force installer-provided ClusterVersion failed with following way.
$ openshift-install create manifests
$ sed -i 's/name: version/name: get-lost/' manifests/cvo-overrides.yaml

Bootstrap fail as expected.
INFO Waiting up to 30m0s for the Kubernetes API at https://api.jliu-6122.qe.devcluster.openshift.com:6443... 
INFO API v1.14.6+35c093a up                       
INFO Waiting up to 30m0s for bootstrapping to complete... 
INFO Use the following commands to gather logs from the cluster 
INFO openshift-install gather bootstrap --help    

Checked cvo log as expected.
...
I0911 01:21:02.541080       1 cvo.go:352] Started syncing cluster version "openshift-cluster-version/version" (2019-09-11 01:21:02.541075391 +0000 UTC m=+45.751008793)
I0911 01:21:02.541106       1 cvo.go:368] No ClusterVersion object and defaulting not enabled, waiting for one
...

And the normal installation on vsphere works well on above version.
# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-09-10-074025   True        False         2m19s   Cluster version is 4.2.0-0.nightly-2019-09-10-074025

Comment 7 errata-xmlrpc 2019-10-16 06:36:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.