Bug 1779299

Summary: CVO doesn't populate infrastructure/cluster field introduced with OCP 4.2.
Product: OpenShift Container Platform Reporter: Chet Hosey <ChetRHosey>
Component: Cluster Version OperatorAssignee: Abhinav Dahiya <adahiya>
Status: CLOSED DUPLICATE QA Contact: liujia <jiajliu>
Severity: unspecified Docs Contact:
Priority: low    
Version: 4.2.zCC: aos-bugs, bzhang, dhansen, jokerman
Target Milestone: ---Keywords: Reopened
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-13 18:29:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chet Hosey 2019-12-03 17:20:27 UTC
Description of problem:

CVO doesn't populate infrastructure/cluster's status.platformStatus field. This was introduced with OCP 4.2, and the original status.platform field is deprecated.

The missing field causes the network operator to crash when applying proxy settings on upgraded clusters (see https://bugzilla.redhat.com/show_bug.cgi?id=1773870). I'm guessing that Red Hat doesn't test configurations that omit status.platformStatus, or doesn't test them as extensively, so it seems safest to have the CVO populate the new field on upgrade. (Or, if there's an operator responsible for this specific CR, it should populate its status appropriately. I couldn't find evidence that this infrastructure/cluster's `status` component is written outside of installs.)

After a fresh OCP 4.2 install, "oc get infrastructure cluster -o yaml" includes content like the following:

    status:
      apiServerInternalURI: https://api-int.example.com:6443
      apiServerURL: https://api.jmalde.ocp.example.com:6443
      etcdDiscoveryDomain: jmalde.ocp.example.com
      infrastructureName: jmalde-qnkxr
      platform: VSphere
      platformStatus:
        type: VSphere

However after an upgrade from 4.1 -> 4.2, status.platformStatus isn't populated.

How reproducible:

Verified multiple times on vSphere UPI.

Steps to Reproduce:
1. Install OCP 4.1
2. Upgrade to 4.2
3. oc get infrastructure cluster -o yaml

Actual results:

`oc get infrastructure cluster -o yaml` produces the following status from an upgraded cluster:

    status:
      apiServerInternalURI: https://api-int.example.com:6443
      apiServerURL: https://api.jmalde.ocp.example.com:6443
      etcdDiscoveryDomain: jmalde.ocp.example.com
      infrastructureName: jmalde-qnkxr
      platform: VSphere

Expected results:

    status:
      apiServerInternalURI: https://api-int.example.com:6443
      apiServerURL: https://api.jmalde.ocp.example.com:6443
      etcdDiscoveryDomain: jmalde.ocp.example.com
      infrastructureName: jmalde-qnkxr
      platform: VSphere
      platformStatus:
        type: VSphere

Additional info:

The network operator is being updated to handle status.platformStatus being an optional field. However, this may not be a configuration Red Hat is testing broadly. Unless there's a reason not to, it would be ideal to have upgrades populate the new field to better match freshly-installed clusters.

Comment 1 Abhinav Dahiya 2019-12-03 17:55:20 UTC
> https://github.com/openshift/api/blob/d4a64ec2cbd86f11ea74dfdcf6520d5833d0c6cd/config/v1/types_infrastructure.go#L41-L49

the status.platformStatus is optional. So there is not expectation that the new fields need to be configured until we have plans to deprecated the previous field.

Comment 2 Chet Hosey 2019-12-03 17:58:38 UTC
From a testing perspective the expectation seems to be there. Otherwise the fact that a cluster can't be upgraded from 4.1 -> 4.2 and take advantage of new features might have been caught.

Comment 3 Chet Hosey 2019-12-03 20:45:00 UTC
What do you mean about plans to deprecate the previous field? The source code says of the previous field: "Deprecated: Use platformStatus.type instead."

There is no reason to keep the variation between freshly installed clusters and upgraded ones.

Comment 4 Abhinav Dahiya 2019-12-03 20:54:18 UTC
The APIs need to be backwards compatible.. and therefore we will fix it if openshift operators can't handle this safely. We introduced this in 4.2 and for one release cycle we haven't seen the need, and 4.4 there are no operators requesting this migration. 

So if this is not fixed in 4.4 and it moved to 4.5, the team should look at closing it as WONTFIX.

Comment 5 Abhinav Dahiya 2019-12-03 20:55:17 UTC
Also this is not a CVO bug. as infrastructure is global configuration setup at install by openshift-install.

Comment 6 Chet Hosey 2019-12-03 20:57:13 UTC
 Is openshift-install used during upgrades? The value is correct for new clusters. It's just missing on upgrade (4.1 -> 4.2).

Comment 7 Daneyon Hansen 2020-01-13 18:29:07 UTC
@Clayon Coleman created a bug for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1787765

*** This bug has been marked as a duplicate of bug 1787765 ***