We have now had our 8th or 9th panic bug related to status.infraPlatform not being filled on clusters that started before 4.1. This has to stop. We need to fill infrastructure.status.infraPlatform on all clusters so that this bug stops happening and breaking customers. We need to identify which component should do the backfill, how we do that with the info we have, and which operator deploys it (this bug can be moved). This is required to be fixed before 4.5.0 GA so we can stop hitting this bug.
I am trying to reproduce it. The cluster installed with 4.1 does not have platformStatus filled. After cluster upgrade to 4.4, it still does not have platformStatus filled. Install an AWS cluster with 4.1 and check the infrastructure: # oc get clusterversions NAME VERSION version 4.1.38 # oc get infrastructure.config.openshift.io -o yaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: Infrastructure metadata: creationTimestamp: "2020-04-22T08:10:38Z" generation: 1 name: cluster resourceVersion: "404" selfLink: /apis/config.openshift.io/v1/infrastructures/cluster uid: c0279e9d-8470-11ea-a8ed-02eb38e8de4a spec: cloudConfig: name: "" status: apiServerInternalURI: https://api-int.yanyang-4-1.qe.devcluster.openshift.com:6443 apiServerURL: https://api.yanyang-4-1.qe.devcluster.openshift.com:6443 etcdDiscoveryDomain: yanyang-4-1.qe.devcluster.openshift.com infrastructureName: yanyang-4-1-m7wtn platform: AWS <------ there is no platformStatus kind: List metadata: resourceVersion: "" selfLink: "" Upgrade the cluster to 4.4 and check the infrastructure. # oc get clusterversion NAME VERSION version 4.4.0-rc.9 # oc get infrastructures.config.openshift.io -o yaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: Infrastructure metadata: creationTimestamp: "2020-04-22T08:10:38Z" generation: 1 name: cluster resourceVersion: "404" selfLink: /apis/config.openshift.io/v1/infrastructures/cluster uid: c0279e9d-8470-11ea-a8ed-02eb38e8de4a spec: cloudConfig: name: "" status: apiServerInternalURI: https://api-int.yanyang-4-1.qe.devcluster.openshift.com:6443 apiServerURL: https://api.yanyang-4-1.qe.devcluster.openshift.com:6443 etcdDiscoveryDomain: yanyang-4-1.qe.devcluster.openshift.com infrastructureName: yanyang-4-1-m7wtn platform: AWS <------ There is still no platformStatus configuration kind: List metadata: resourceVersion: "" selfLink: ""
> After cluster upgrade to 4.4... This bug targets 4.5, so you have to update all the way to a 4.5 nightly that includes the patches config operator.
Verified with: 4.5.0-0.nightly-2020-05-04-113741 Upgrade path: 4.1.38 -> 4.2.29 -> 4.3.18 -> 4.4.3 -> 4.5.0-0.nightly-2020-05-04-113741 Before upgrade to 4.5 (platformStatus is not present) ~~~ $ oc get infrastructure.config.openshift.io -o yaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: Infrastructure metadata: creationTimestamp: "2020-05-04T23:26:09Z" generation: 1 name: cluster resourceVersion: "402" selfLink: /apis/config.openshift.io/v1/infrastructures/cluster uid: a2bd0777-8e5e-11ea-945f-026a593e29d4 spec: cloudConfig: name: "" status: apiServerInternalURI: https://api-int.cluster.qe.devcluster.openshift.com:6443 apiServerURL: https://api.cluster.qe.openshift.com:6443 etcdDiscoveryDomain: cluster.qe.openshift.com infrastructureName: cluster-azj38 platform: AWS kind: List metadata: resourceVersion: "" selfLink: "" ~~~ After upgrade to 4.5 (platformStatus is present) ~~~ $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-05-04-113741 True False 77s Cluster version is 4.5.0-0.nightly-2020-05-04-113741 $ oc get infrastructure.config.openshift.io -o yaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: Infrastructure metadata: creationTimestamp: "2020-05-04T23:26:09Z" generation: 1 name: cluster resourceVersion: "140951" selfLink: /apis/config.openshift.io/v1/infrastructures/cluster uid: a2bd0777-8e5e-11ea-945f-026a593e29d4 spec: cloudConfig: name: "" status: apiServerInternalURI: https://api-int.cluster.qe.devcluster.openshift.com:6443 apiServerURL: https://api.cluster.qe.openshift.com:6443 etcdDiscoveryDomain: cluster.qe.openshift.com infrastructureName: cluster-azj38 platform: AWS platformStatus: aws: region: us-east-2 type: AWS kind: List metadata: resourceVersion: "" selfLink: "" ~~~
*** Bug 1788437 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409