Bug 1858342 - Installer: Incorrectly validates for a Docker Bridge network that doesn't exist
Summary: Installer: Incorrectly validates for a Docker Bridge network that doesn't exist
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: All
OS: All
urgent
high
Target Milestone: ---
: 4.6.0
Assignee: Abhinav Dahiya
QA Contact: Yang Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-17 15:43 UTC by Will Gordon
Modified: 2021-02-12 08:23 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:15:58 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 3980 0 None closed Bug 1858342: types: allow docker bridge network range except on libvirt 2021-02-18 03:37:12 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:16:18 UTC

Description Will Gordon 2020-07-17 15:43:39 UTC
Description of problem:

The installer is currently validating and warning for a CIDR overlap that doesn't exist: https://github.com/openshift/installer/blob/master/pkg/validate/validate.go#L138

After confirming with SDN, this Docker Bridge subnet does not exist, nor is used in OCP v4 on CRI-O


How reproducible: always


Steps to Reproduce:

1. Provide a Machine CIDR of "172.17.0.0/16" in the installer
2. Try to install your cluster


Actual results:

"time="2020-04-21T10:45:21Z" level=debug msg="installer console log: level=fatal msg=\"failed to fetch Master Machines: failed to load asset \\\"Install Config\\\": invalid \\\"install-config.yaml\\\" file: networking.clusterNetwork[0].cidr: Invalid value: \\\"172.16.0.0/15\\\": overlaps with default Docker Bridge subnet (172.16.0.0/15)\"\n" installID=qn46zp8f"


Expected results:

This validation is invalid and should be removed


Additional info:
Validation appears to have been originally introduced in https://github.com/openshift/installer/pull/342/files#diff-474a68b8f7d6552a4f35f1d003d86e8bR463. This *was* a valid validation in OCP v3, but no longer in OCP v4

Comment 2 Dan Williams 2020-07-17 16:35:05 UTC
Docker will not be present on any OCP 4.x clusters, as docker is not installed by RHCOS and should not even be used on BYOH. As such https://github.com/openshift/installer/blob/master/pkg/validate/validate.go#L137 is wrong and should be removed.

The code in question validates whether the ClusterCIDR, ServiceCIDR, and MachineNetworks overlap with "special" subnets. Docker won't be one of those in 4.x.

The only thing I can think of to check is the podman/crio default bridge subnet, but normally nodes should not be running non-host-network containers in an OpenShift cluster outside of kubelet and the CNI plugin. That subnet is 10.88.0.0/16

Comment 5 Abhinav Dahiya 2020-07-17 18:52:01 UTC
> 
"time="2020-04-21T10:45:21Z" level=debug msg="installer console log: level=fatal msg=\"failed to fetch Master Machines: failed to load asset \\\"Install Config\\\": invalid \\\"install-config.yaml\\\" file: networking.clusterNetwork[0].cidr: Invalid value: \\\"172.16.0.0/15\\\": overlaps with default Docker Bridge subnet (172.16.0.0/15)\"\n" installID=qn46zp8f"

The validation message is clear that the installer does not support networks that overlap with docker subnnet bridge (172.16.0.0/15). This is not a bug. If the user requirement is to remove this restriction please track the feature work in JIRA by creating a story in
- https://issues.redhat.com/projects/RFE or,
- https://issues.redhat.com/projects/CORS

Comment 7 Dan Williams 2020-07-18 01:23:54 UTC
@abhinav it is a bug, because docker has not been present on *any* OpenShift 4.x cluster ever. It is a bug that a user of OpenShift 4.x cannot assign a ClusterCIDR, ServiceCIDR, or MachineNetwork that overlaps with a subnet that is not, and will never be, used by a docker bridge in OpenShift or RHEL.

Comment 15 Yang Yang 2020-08-03 04:24:56 UTC
Reproducing it with openshift-install 4.5.0-rc.5
Install GCP cluster with network set as below:
networking:
  clusterNetwork:
  - cidr: 172.16.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.211.66.0/23 
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16

# openshift-install create cluster --dir bz
FATAL failed to fetch Metadata: failed to load asset "Install Config": invalid "install-config.yaml" file: networking.clusterNetwork[0].cidr: Invalid value: "172.16.0.0/14": overlaps with default Docker Bridge subnet (172.16.0.0/14) 

Verifying it with openshift-install 4.6.0-0.nightly-2020-08-02-091622 on GCP
# openshift-install create cluster --dir bz
WARNING networking.clusterNetwork[0]: 172.16.0.0/14 overlaps with default Docker Bridge subnet 
INFO Credentials loaded from file "/root/.gcp/osServiceAccount.json" 
INFO Consuming Install Config from target directory 
INFO Creating infrastructure resources...      
...
INFO Waiting up to 10m0s for the openshift-console route to be created... 
INFO Install complete!                            
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/root/build/bz/auth/kubeconfig' 
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.yybz.qe.gcp.devcluster.openshift.com 
INFO Login to the console with user: "kubeadmin", and password: "4SZZN-zSrQz-Fwza4-uoAiN" 
INFO Time elapsed: 35m6s         

There is a warning against CIDR prompted and the installation is successful.

Comment 16 Yang Yang 2020-08-03 05:56:07 UTC
Verifying it on libvirt with below setting:

networking:
  clusterNetwork:
  - cidr: 172.16.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.211.66.0/23 
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  libvirt:
    URI: qemu+tcp://192.168.122.1/system
    network:
      if: mybridge0


# openshift-install create cluster --dir bz1
FATAL failed to fetch Metadata: failed to load asset "Install Config": invalid "install-config.yaml" file: [networking.clusterNetwork[0]: Invalid value: "172.16.0.0/14": overlaps with default Docker Bridge subnet, platform: Invalid value: "libvirt": must specify one of the platforms (aws, azure, baremetal, gcp, none, openstack, ovirt, vsphere)] 

There is an error against CIDR prompted

Comment 17 Yang Yang 2020-08-04 06:11:54 UTC
Verified on AWS with below networking:
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 172.17.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16

+ ./openshift-install create manifests --dir '/home/jenkins/workspace/Launch Environment Flexy/workdir/install-dir'
level=warning msg="networking.machineNetwork[0]: 172.17.0.0/16 overlaps with default Docker Bridge subnet"

There is a warning against CIDR prompted and the installation is successful.

Comment 18 Yang Yang 2020-08-04 06:24:34 UTC
Verified on vsphere with below networking:
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.17.0.0/14

+ ./openshift-install create manifests --dir '/home/installer4/workspace/Launch Environment Flexy/workdir/install-dir'
level=warning msg="networking.serviceNetwork[0]: 172.17.0.0/14 overlaps with default Docker Bridge subnet"

There is a warning against serviceNetwork prompted.

In short, on libvirt platform overlap with docker bridge is forbidden, and on the others platform overloap with docker bridge is allowed and has an warning shown to user. Moving it to verified state.

Comment 20 Choo Pui kun 2020-10-27 06:50:49 UTC
(In reply to Yang Yang from comment #18)
> Verified on vsphere with below networking:
> networking:
>   clusterNetwork:
>   - cidr: 10.128.0.0/14
>     hostPrefix: 23
>   machineNetwork:
>   - cidr: 10.0.0.0/16
>   networkType: OpenShiftSDN
>   serviceNetwork:
>   - 172.17.0.0/14
> 
> + ./openshift-install create manifests --dir
> '/home/installer4/workspace/Launch Environment Flexy/workdir/install-dir'
> level=warning msg="networking.serviceNetwork[0]: 172.17.0.0/14 overlaps with
> default Docker Bridge subnet"
> 
> There is a warning against serviceNetwork prompted.
> 
> In short, on libvirt platform overlap with docker bridge is forbidden, and
> on the others platform overloap with docker bridge is allowed and has an
> warning shown to user. Moving it to verified state.


#########

I'm testing on vpshere install-config and this is what I get:

❯ openshift-install create manifests --dir=dev-ocp/ --log-level=debug
DEBUG OpenShift Installer 4.5.15
DEBUG Built from commit 9893a482f310ee72089872f1a4caea3dbec34f28
DEBUG Fetching Master Machines...
DEBUG Loading Master Machines...
DEBUG   Loading Cluster ID...
DEBUG     Loading Install Config...
DEBUG       Loading SSH Key...
DEBUG       Loading Base Domain...
DEBUG         Loading Platform...
DEBUG       Loading Cluster Name...
DEBUG         Loading Base Domain...
DEBUG         Loading Platform...
DEBUG       Loading Pull Secret...
DEBUG       Loading Platform...
FATAL failed to fetch Master Machines: failed to load asset "Install Config": invalid "install-config.yaml" file: networking.machineNetwork[0]: Invalid value: "172.16.0.0/12": overlaps with default Docker Bridge subnet (172.16.0.0/12)

the install-config.yml is as below:
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 172.16.0.0/12
  networkType: OpenShiftSDN
  serviceNetwork:
  - 10.132.0.0/16

Comment 21 Choo Pui kun 2020-10-27 07:14:13 UTC
Adding to comment 20, I'm using openshift-installer on Mac

Comment 22 errata-xmlrpc 2020-10-27 16:15:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 23 Yang Yang 2020-11-10 01:59:04 UTC
(In reply to Choo Pui kun from comment #20)
> (In reply to Yang Yang from comment #18)
> > Verified on vsphere with below networking:
> > networking:
> >   clusterNetwork:
> >   - cidr: 10.128.0.0/14
> >     hostPrefix: 23
> >   machineNetwork:
> >   - cidr: 10.0.0.0/16
> >   networkType: OpenShiftSDN
> >   serviceNetwork:
> >   - 172.17.0.0/14
> > 
> > + ./openshift-install create manifests --dir
> > '/home/installer4/workspace/Launch Environment Flexy/workdir/install-dir'
> > level=warning msg="networking.serviceNetwork[0]: 172.17.0.0/14 overlaps with
> > default Docker Bridge subnet"
> > 
> > There is a warning against serviceNetwork prompted.
> > 
> > In short, on libvirt platform overlap with docker bridge is forbidden, and
> > on the others platform overloap with docker bridge is allowed and has an
> > warning shown to user. Moving it to verified state.
> 
> 
> #########
> 
> I'm testing on vpshere install-config and this is what I get:
> 
> ❯ openshift-install create manifests --dir=dev-ocp/ --log-level=debug
> DEBUG OpenShift Installer 4.5.15
> DEBUG Built from commit 9893a482f310ee72089872f1a4caea3dbec34f28
> DEBUG Fetching Master Machines...
> DEBUG Loading Master Machines...
> DEBUG   Loading Cluster ID...
> DEBUG     Loading Install Config...
> DEBUG       Loading SSH Key...
> DEBUG       Loading Base Domain...
> DEBUG         Loading Platform...
> DEBUG       Loading Cluster Name...
> DEBUG         Loading Base Domain...
> DEBUG         Loading Platform...
> DEBUG       Loading Pull Secret...
> DEBUG       Loading Platform...
> FATAL failed to fetch Master Machines: failed to load asset "Install
> Config": invalid "install-config.yaml" file: networking.machineNetwork[0]:
> Invalid value: "172.16.0.0/12": overlaps with default Docker Bridge subnet
> (172.16.0.0/12)
> 
> the install-config.yml is as below:
> networking:
>   clusterNetwork:
>   - cidr: 10.128.0.0/14
>     hostPrefix: 23
>   machineNetwork:
>   - cidr: 172.16.0.0/12
>   networkType: OpenShiftSDN
>   serviceNetwork:
>   - 10.132.0.0/16

The issue is fixed in 4.6, so please have a try with 4.6 payload.


Note You need to log in before you can comment on or make changes to this bug.