Bug 1890038

Summary: Infrastructure status.platform not migrated to status.platformStatus causes warnings
Product: OpenShift Container Platform Reporter: Pablo Alonso Rodriguez <palonsor>
Component: config-operatorAssignee: Matthew Staebler <mstaeble>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: aos-bugs, apjagtap, hgomes, kewang, mstaeble, shsaxena, wking, ychoukse
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The .status.platformStatus field of the Infrastructure resource is not populated when migrating from earlier OpenShift versions. Consequence: The cluster-config-operator emits warnings about the un-populated field. Fix: Update the migration controller in the cluster-config-operator to populate the .status.platformStatus field when it is un-populated. Result: The .status.platformStatus field is populated for all platforms regardless of the original OpenShift version installed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:27:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1936543    

Description Pablo Alonso Rodriguez 2020-10-21 09:18:48 UTC
Description of problem:

In 4.1, Infrastructure objects used status.platform field[1] but since 4.2, it has been replaced with status.platformStatus field that contains a type subfield and this is what is currently used in 4.5[2].

This causes that, if a cluster was installed in 4.1 and is upgraded all the path to 4.5 (i.e. 4.1-->4.2-->4.3-->4.4-->4.5), the following warning is constantly emitted: 

"Falling back to deprecated status.platform because infrastructures.config.openshift.io/cluster status.platformStatus.type is empty"

Which seems to come from here[3].

Emitting this warning and working with older field is good, but it would be best if the field is just migrated properly and make sure that compatibility is not just removed without migrating, as it would impact clusters installed at 4.1 and updated all the way to latest. In addition, it would also prevent the warning spam.

Version-Release number of selected component (if applicable):

4.5

How reproducible:

Always but only if cluster was installed 4.1 and updated all the way to 4.5

Steps to Reproduce:
1. Install a 4.1 cluster
2. Upgrade it all the way to 4.5: 4.1-->4.2-->4.3-->4.4-->4.5
3. Bug reproducible

Actual results:

Infrastructure object uses deprecated field and warning is raised

Expected results:

Infrastructure object had its fields migrated and no warning.

References:

[1] - https://github.com/openshift/installer/blob/release-4.1/pkg/asset/manifests/infrastructure.go#L74
[2] - https://github.com/openshift/installer/blob/release-4.5/pkg/asset/manifests/infrastructure.go#L65
[3] - https://github.com/openshift/cluster-config-operator/blob/release-4.5/pkg/operator/kube_cloud_config/controller.go#L88

Comment 3 Stefan Schimanski 2020-10-22 08:26:36 UTC
infrastructure.status.platformStatus is owned by installer:

$ git blame config/v1/types_infrastructure.go
dedfb47b1 (W. Trevor King             2019-04-26 11:49:00 -0700  59)    // platformStatus holds status information specific to the underlying
dedfb47b1 (W. Trevor King             2019-04-26 11:49:00 -0700  60)    // infrastructure provider.
dedfb47b1 (W. Trevor King             2019-04-26 11:49:00 -0700  61)    // +optional
dedfb47b1 (W. Trevor King             2019-04-26 11:49:00 -0700  62)    PlatformStatus *PlatformStatus `json:"platformStatus,omitempty"`
dedfb47b1 (W. Trevor King             2019-04-26 11:49:00 -0700  63)

Changing component.

Comment 4 Scott Dodson 2020-10-22 12:35:09 UTC
The installer is not a runtime component, it can't resolve items like this. This needs to be reconciled by the config-operator.

Moving back to config-operator and assigning to aos-install

Comment 5 Pablo Alonso Rodriguez 2020-11-30 15:18:14 UTC
Increasing sev and prio of this bug.

It seems that machine-api does not properly fall back to the deprecated status.platform field in the same way than the config operator, so a cluster installed at 4.1 and updated all this way up to 4.6 can end up with the machine-api degraded and misbehaving.

I am still working on opening a separate bug at machine-api component, so that they properly fall back to deprecated status.platform field.

However, we are in risk than other components make the same mistake, potentially causing equal or worse issues and I understand that, at some point, status.platform will be completely removed. So we do need a proper and automatic migration to happen.

Comment 7 Scott Dodson 2020-12-09 16:15:22 UTC
I'm setting the blocker+ flag, we need to close this gap no later than 4.7.

Comment 17 Joel Speed 2021-01-04 10:48:04 UTC
*** Bug 1911467 has been marked as a duplicate of this bug. ***

Comment 23 Ke Wang 2021-01-28 10:38:06 UTC
One successful upgrade path as below, 
$ oc get clusterversion -o json|jq ".items[0].status.history"
[
  {
    "completionTime": "2021-01-28T09:28:04Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:465d130601325059554b57dfc9553b826918f356b1362972ef21b5112a4e1e71",
    "startedTime": "2021-01-28T08:20:35Z",
    "state": "Completed",
    "verified": false,
    "version": "4.7.0-fc.4"
  },
  {
    "completionTime": "2021-01-28T07:44:35Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:5c3618ab914eb66267b7c552a9b51c3018c3a8f8acf08ce1ff7ae4bfdd3a82bd",
    "startedTime": "2021-01-28T06:42:17Z",
    "state": "Completed",
    "verified": false,
    "version": "4.6.12"
  },
  {
    "completionTime": "2021-01-28T05:54:17Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:412276155bfe186c35322a788321ebf110130a272e18f55a1a2510f15ee0bb04",
    "startedTime": "2021-01-28T04:56:06Z",
    "state": "Completed",
    "verified": true,
    "version": "4.5.27"
  },
  {
    "completionTime": "2021-01-28T04:46:02Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:fcffc8b6c05f9cadd1ab96b134fcc4de28bcd8e11dd8aadb3a040baf54a0a072",
    "startedTime": "2021-01-28T03:58:00Z",
    "state": "Completed",
    "verified": true,
    "version": "4.4.32"
  },
  {
    "completionTime": "2021-01-28T03:34:30Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:9ff90174a170379e90a9ead6e0d8cf6f439004191f80762764a5ca3dbaab01dc",
    "startedTime": "2021-01-28T02:51:45Z",
    "state": "Completed",
    "verified": true,
    "version": "4.3.40"
  },
  {
    "completionTime": "2021-01-28T02:36:45Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:f097ce3fb313ec1613a146b6f1dec64dbb1e85b1b1c8d01bd95ef29525a32b65",
    "startedTime": "2021-01-28T01:58:34Z",
    "state": "Completed",
    "verified": true,
    "version": "4.2.34"
  },
  {
    "completionTime": "2021-01-28T01:49:49Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:a8f706d139c8e77d884ccedbf67d69eefd67b66dcf69ee1032b507fe3acbf8c8",
    "startedTime": "2021-01-28T01:36:41Z",
    "state": "Completed",
    "verified": false,
    "version": "4.1.41"
  }
]

Upgraded from 4.4 to 4.5, a new field platformStatus added.
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.27    True        False         46m     Cluster version is 4.5.27

$ oc get infrastructures.config.openshift.io/cluster -oyaml
apiVersion: config.openshift.io/v1
kind: Infrastructure
...
status:
...
  platform: AWS
  platformStatus:
    aws:
      region: us-east-2
    type: AWS

From above results, the fix works fine, move the bug VERIFIED.

Comment 26 errata-xmlrpc 2021-02-24 15:27:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633