Bug 1814332 - Cluster needs to backfill infrastructure.status.infraPlatform
Summary: Cluster needs to backfill infrastructure.status.infraPlatform
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.0
Assignee: Abhinav Dahiya
QA Contact: Etienne Simard
URL:
Whiteboard:
: 1788437 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-17 16:48 UTC by Clayton Coleman
Modified: 2020-08-04 18:05 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Addition of new field for clusters installed after 4.1 Consequence: The operators have to check and use old fields for clusters that were installed as 4.1 creating error during upgrades Fix: Migration controller sets the new fields for all clusters during upgrade using the information already available in the cluster Result: All the clients can depend on new fields reducing bugs.
Clone Of:
Environment:
Last Closed: 2020-08-04 18:05:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-config-operator pull 127 0 None closed Bug 1814332: add migration_aws_status controller 2020-12-09 16:36:41 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-08-04 18:05:56 UTC

Description Clayton Coleman 2020-03-17 16:48:02 UTC
We have now had our 8th or 9th panic bug related to status.infraPlatform not being filled on clusters that started before 4.1.  This has to stop.

We need to fill infrastructure.status.infraPlatform on all clusters so that this bug stops happening and breaking customers.

We need to identify which component should do the backfill, how we do that with the info we have, and which operator deploys it (this bug can be moved).

This is required to be fixed before 4.5.0 GA so we can stop hitting this bug.

Comment 6 Yang Yang 2020-04-22 11:38:09 UTC
I am trying to reproduce it. The cluster installed with 4.1 does not have platformStatus filled. After cluster upgrade to 4.4, it still does not have platformStatus filled.

Install an AWS cluster with 4.1 and check the infrastructure:
# oc get clusterversions
NAME      VERSION
version   4.1.38

# oc get infrastructure.config.openshift.io -o yaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: Infrastructure
  metadata:
    creationTimestamp: "2020-04-22T08:10:38Z"
    generation: 1
    name: cluster
    resourceVersion: "404"
    selfLink: /apis/config.openshift.io/v1/infrastructures/cluster
    uid: c0279e9d-8470-11ea-a8ed-02eb38e8de4a
  spec:
    cloudConfig:
      name: ""
  status:
    apiServerInternalURI: https://api-int.yanyang-4-1.qe.devcluster.openshift.com:6443
    apiServerURL: https://api.yanyang-4-1.qe.devcluster.openshift.com:6443
    etcdDiscoveryDomain: yanyang-4-1.qe.devcluster.openshift.com
    infrastructureName: yanyang-4-1-m7wtn
    platform: AWS  <------ there is no platformStatus
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


Upgrade the cluster to 4.4 and check the infrastructure.

# oc get clusterversion
NAME      VERSION
version   4.4.0-rc.9

# oc get infrastructures.config.openshift.io -o yaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: Infrastructure
  metadata:
    creationTimestamp: "2020-04-22T08:10:38Z"
    generation: 1
    name: cluster
    resourceVersion: "404"
    selfLink: /apis/config.openshift.io/v1/infrastructures/cluster
    uid: c0279e9d-8470-11ea-a8ed-02eb38e8de4a
  spec:
    cloudConfig:
      name: ""
  status:
    apiServerInternalURI: https://api-int.yanyang-4-1.qe.devcluster.openshift.com:6443
    apiServerURL: https://api.yanyang-4-1.qe.devcluster.openshift.com:6443
    etcdDiscoveryDomain: yanyang-4-1.qe.devcluster.openshift.com
    infrastructureName: yanyang-4-1-m7wtn
    platform: AWS  <------ There is still no platformStatus configuration
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 7 W. Trevor King 2020-04-22 23:13:44 UTC
> After cluster upgrade to 4.4...

This bug targets 4.5, so you have to update all the way to a 4.5 nightly that includes the patches config operator.

Comment 11 Etienne Simard 2020-05-05 05:02:49 UTC
Verified with: 4.5.0-0.nightly-2020-05-04-113741

Upgrade path: 4.1.38 -> 4.2.29 -> 4.3.18 -> 4.4.3 -> 4.5.0-0.nightly-2020-05-04-113741

Before upgrade to 4.5 (platformStatus is not present)

~~~
$ oc get infrastructure.config.openshift.io -o yaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: Infrastructure
  metadata:
    creationTimestamp: "2020-05-04T23:26:09Z"
    generation: 1
    name: cluster
    resourceVersion: "402"
    selfLink: /apis/config.openshift.io/v1/infrastructures/cluster
    uid: a2bd0777-8e5e-11ea-945f-026a593e29d4
  spec:
    cloudConfig:
      name: ""
  status:
    apiServerInternalURI: https://api-int.cluster.qe.devcluster.openshift.com:6443
    apiServerURL: https://api.cluster.qe.openshift.com:6443
    etcdDiscoveryDomain: cluster.qe.openshift.com
    infrastructureName: cluster-azj38
    platform: AWS
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

~~~

After upgrade to 4.5 (platformStatus is present)

~~~
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-05-04-113741   True        False         77s     Cluster version is 4.5.0-0.nightly-2020-05-04-113741
$ oc get infrastructure.config.openshift.io -o yaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: Infrastructure
  metadata:
    creationTimestamp: "2020-05-04T23:26:09Z"
    generation: 1
    name: cluster
    resourceVersion: "140951"
    selfLink: /apis/config.openshift.io/v1/infrastructures/cluster
    uid: a2bd0777-8e5e-11ea-945f-026a593e29d4
  spec:
    cloudConfig:
      name: ""
  status:
    apiServerInternalURI: https://api-int.cluster.qe.devcluster.openshift.com:6443
    apiServerURL: https://api.cluster.qe.openshift.com:6443
    etcdDiscoveryDomain: cluster.qe.openshift.com
    infrastructureName: cluster-azj38
    platform: AWS
    platformStatus:
      aws:
        region: us-east-2
      type: AWS
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
~~~

Comment 12 Abhinav Dahiya 2020-05-22 15:32:28 UTC
*** Bug 1788437 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2020-08-04 18:05:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.