Bug 1508445

Summary: [3.6] failed to start SDN plugin controller when Network CIDRS are invalid.
Product: OpenShift Container Platform Reporter: Ben Bennett <bbennett>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.0CC: aloughla, aos-bugs, bbennett, bmeng, danw, jtanenba, rhowe, yadu
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: 3.6 rejected certain invalid master-config.yaml values which 3.5 silently accepted Consequence: When upgrading from 3.5 to 3.6, the master would fail to start if the clusterNetworkCIDR or serviceNetworkCIDR value in master-config.yaml was "invalid". (eg, if you had "172.30.1.1/16" instead of "172.30.0.0/16") Fix: 3.6 now accepts the same invalid values that 3.5 accepted, but logs a warning about it Result: Upgrades will now work, and the admin is notified about the incorrect config values
Story Points: ---
Clone Of: 1506017 Environment:
Last Closed: 2017-12-14 21:02:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1506017    
Bug Blocks:    

Description Ben Bennett 2017-11-01 12:42:15 UTC
+++ This bug was initially created as a clone of Bug #1506017 +++

Description of problem:

# In your master-config file:
clusterNetworkCIDR: 10.1.0.0/13
serviceNetworkCIDR: 172.30.0.0/1

# clusternetwork object is created with this:
network: 10.0.0.0/13
ServiceNetwork: 172.24.0.0/13


Version-Release number of selected component (if applicable):
3.6 

How reproducible:
100% 

Steps to Reproduce:
1. Install 3.5 cluster with ansible host values of: 

osm_cluster_network_cidr=10.1.0.0/13
openshift_portal_net=172.30.0.0/13

2. After install network gets set to 

# clusternetwork object is created with this:
network: 10.0.0.0/13
ServiceNetwork: 172.24.0.0/13

3. Upgrade to 3.6 with same ansible host values. 

osm_cluster_network_cidr=10.1.0.0/13
openshift_portal_net=172.30.0.0/13


Actual results:

Controller fails to start due to values set in master-config.yaml 

atomic-openshift-master-controllers[111528]: E1019 12:17:26.599325  111528 common.go:46] Configured clusterNetworkCIDR value "10.1.0.0/13" is invalid; treating it as "10.0.0.0/13"

atomic-openshift-master-controllers[111528]: E1019 12:17:26.599336  111528 common.go:54] Configured serviceNetworkCIDR value "172.30.0.0/13" is invalid; treating it as "172.24.0.0/13"

atomic-openshift-master-controllers[111528]: F1019 12:17:26.612560  111528 start_master.go:776] Error starting "openshift.io/sdn" (failed to start SDN plugin controller: cannot change clusterNetworkCIDR to a value that does not include the existing network.)

Expected results:

The controller to start as the values use the same netmask.

--- Additional comment from Dan Winship on 2017-10-31 16:02:22 EDT ---

(Note: In 3.7 this is fixed by the combination of https://github.com/openshift/origin/pull/17076 and https://github.com/openshift/origin/pull/17117.)

--- Additional comment from Dan Winship on 2017-10-31 17:35:02 EDT ---

https://github.com/openshift/ose/pull/918

Comment 2 Yan Du 2017-12-05 07:47:30 UTC
After upgrading OCP v3.5 to v3.6.173.0.83 with the parameters:
osm_cluster_network_cidr=10.1.0.0/13
openshift_portal_net=172.30.0.0/13

Both atomic-openshift-master and node works well after upgrade finished. And we could get the warning when using a invalid Network CIDR.
Dec 05 01:16:12 host-8-241-24.host.centralci.eng.rdu2.redhat.com atomic-openshift-master[26892]: I1205 01:16:12.213132   26892 subnets.go:97] Created HostSubnet host-8-241-24.host.centralci.eng.rdu2.redhat.com (host: "host-8-241-24.host.centralci.eng.rdu2.redhat.com", ip: "10.8.241.24", subnet: "10.1.0.0/23")
Dec 05 01:49:42 host-8-241-24.host.centralci.eng.rdu2.redhat.com atomic-openshift-master[1612]: E1205 01:49:42.561393    1612 common.go:46] Configured clusterNetworkCIDR value "10.1.0.0/13" is invalid; treating it as "10.0.0.0/13"

Comment 5 errata-xmlrpc 2017-12-14 21:02:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3438