Bug 1877984 - using OpenshiftSDN in install-config causes install failure post bootstrap
Summary: using OpenshiftSDN in install-config causes install failure post bootstrap
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.6.0
Assignee: Dan Winship
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1877481 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-11 00:56 UTC by Greg Sheremeta
Modified: 2020-10-27 16:40 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:40:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4207 0 None closed Bug 1877984: Fix "OpenShiftSDN" to proper case when generating network config 2021-02-01 08:12:34 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:40:23 UTC

Description Greg Sheremeta 2020-09-11 00:56:42 UTC
Description of problem:

followup to BZ 1877481

In 4.5, `networkType: OpenshiftSDN` worked.
In 4.6, it results in installation failure with hard to understand messages. 
Using `networkType: OpenShiftSDN` works in 4.6.  (Capital-S vs lowercase)

install log usually looks like this:
time="2020-09-10T20:55:25Z" level=info msg="Cluster operator machine-config Progressing is True with : Working towards 4.6.0-0.nightly-2020-09-10-145837"
time="2020-09-10T20:55:25Z" level=error msg="Cluster operator machine-config Degraded is True with RequiredPoolsFailed: Unable to apply 4.6.0-0.nightly-2020-09-10-145837: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: configuration status for pool master is empty: pool is degraded because nodes fail with \"3 nodes are reporting degraded status on sync\": \"Node ip-10-0-161-165.ec2.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-64cb83bf095afac90544003fc5b9f2b6\\\\\\\" not found\\\", Node ip-10-0-244-171.ec2.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-64cb83bf095afac90544003fc5b9f2b6\\\\\\\" not found\\\", Node ip-10-0-230-197.ec2.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-64cb83bf095afac90544003fc5b9f2b6\\\\\\\" not found\\\"\", retrying"
time="2020-09-10T20:55:25Z" level=info msg="Cluster operator machine-config Available is False with : Cluster not available for 4.6.0-0.nightly-2020-09-10-145837"
time="2020-09-10T20:55:25Z" level=fatal msg="failed to initialize the cluster: Cluster operator machine-config is still updating"
time="2020-09-10T20:55:26Z" level=error msg="error after waiting for command completion" error="exit status 1" installID=hgbr6ffn

There should instead be some upfront validation. Or perhaps the case shouldn't break it like in 4.5.

Version-Release number of selected component (if applicable):
4.6 nightly

How reproducible:
always


Steps to Reproduce:
1. set `networkType: OpenshiftSDN` in install-config yaml
2. install

Actual results:
failure

Expected results:
successful install

Additional info:
follow up to BZ 1877481

Comment 1 Eric Paris 2020-09-17 16:56:55 UTC
This is a user facing API that changed, regressed, and broke real customers. This is a 4.6 blocker.

Comment 2 Scott Dodson 2020-09-17 17:05:28 UTC
*** Bug 1877481 has been marked as a duplicate of this bug. ***

Comment 3 Dan Winship 2020-09-17 17:16:10 UTC
CNO fixes it to be the canonical form in the network config Status, but MCO now has code that looks at the network config Spec (createDiscoveredControllerConfigSpec() in machine-config-operator/pkg/operator/render.go) because it wants to set up system OVS correctly from the get-go. So I guess we need to be case-insensitive there too.

Comment 4 Ricardo Carrillo Cruz 2020-09-21 15:11:26 UTC
Agreed on draft PR to fix this at installer layer, thus changing component and clearing up POST to NEW:

https://github.com/openshift/machine-config-operator/pull/2101

Comment 5 Scott Dodson 2020-09-22 15:50:26 UTC
The install-config API for this field is an opaque string. Therefore the canonicalization should not happen in the installer moving back to networking component.

$ openshift-install explain installconfig.networking.networkType
KIND:     InstallConfig
VERSION:  v1

RESOURCE: <string>
  NetworkType is the type of network to install. The default is OpenShiftSDN

Comment 7 zhaozhanqi 2020-09-25 05:12:24 UTC
Verified this bug on 4.6.0-0.nightly-2020-09-24-095222
with 'OpenshiftSDN' also works

Comment 10 errata-xmlrpc 2020-10-27 16:40:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.