Bug 1871048 - Cluster with small block of IP addrresses - range of machineCIDR
Summary: Cluster with small block of IP addrresses - range of machineCIDR
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.5
Hardware: All
OS: All
high
high
Target Milestone: ---
: 4.6.0
Assignee: Martin André
QA Contact: David Sanz
URL:
Whiteboard:
Depends On:
Blocks: 1872629
TreeView+ depends on / blocked
 
Reported: 2020-08-21 08:38 UTC by David Hernández Fernández
Modified: 2023-12-15 18:57 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: A value of 16000 nodes was set when calculating the end of DHCP allocation pool when provisioning the nodes subnet. Consequence: deploying OpenShift on OpenStack platform with a machine CIDR smaller than /18 resulted in an error. Fix: stop hardcoding the number of nodes and instead dynamically calculate the end of DHCP allocation pool. Result: it is now possible to deploy OpenShift on OpenStack with machine CIDR of any length, provided it is large enough to for all needed nodes.
Clone Of:
Environment:
Last Closed: 2020-10-27 16:30:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4077 0 None closed Bug 1871048: OpenStack: dynamically set end of DHCP allocation pool 2021-01-01 15:46:07 UTC
Red Hat Knowledge Base (Solution) 4767541 0 None None None 2020-10-21 05:43:50 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:30:28 UTC

Description David Hernández Fernández 2020-08-21 08:38:42 UTC
Description of problem: It is assumed that the network should be /18 but would be needed to use /24 network. The intended use case is to have the possibility to use smaller hostnetworks i.e smaller subnet.

The issue appeared during a deployment on OpenStack :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DEBUG module.bootstrap.data.openstack_images_image_v2.bootstrap_image: Refreshing state... 
ERROR                                              
ERROR Error: Error in function call                
ERROR                                              
ERROR   on ../../tmp/openshift-install-273350470/topology/private-network.tf line 31, in resource "openstack_networking_subnet_v2" "nodes": 
ERROR   31:     end   = cidrhost(local.nodes_cidr_block, 16000) 
ERROR     |----------------                        
ERROR     | local.nodes_cidr_block is "10.0.0.0/24" 
ERROR                                              
ERROR Call to function "cidrhost" failed: prefix of 24 does not accommodate a host 
ERROR numbered 16000.                              
ERROR                                              
ERROR Failed to read tfstate: open /tmp/openshift-install-273350470/terraform.tfstate: no such file or directory 
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply using Terraform 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Error come from terraform definitions for OpenStack, it had:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cidr_block = var.machine_cidr
customer had var.machine_cidr is 10.0.0.0/24
then we have file from error block, e.g. ./topology/private-network.tf and there is:
nodes_cidr_block = var.cidr_block
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Later we have allocation_pool that would fit into at least 14 bits per network (16382 hosts), so that's why CIDR has to be /18 or smaller
And there is a comment in code snippet about /18
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # We reserve some space at the beginning of the CIDR to use for the VIPs
  # It would be good to make this more dynamic by calculating the number of
  # addresses in the provided CIDR. This currently assumes at least a /18.
  # FIXME(mandre) if we let the ports pick up VIPs automatically, we don't have
  # to do any of this.
  allocation_pool {
    start = cidrhost(local.nodes_cidr_block, 10)
    end   = cidrhost(local.nodes_cidr_block, 16000)
  }
https://github.com/openshift/installer/blob/release-4.5/data/data/openstack/topology/private-network.tf#L25-L34
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For confirmation we could analyze one more error:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"ERROR Call to function "cidrhost" failed: prefix of 24 does not accommodate a host numbered 16000. "
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is invoked by OCP code, it simply takes 16000th IP, if not available in given CIDR, then failure.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 52 // Host takes a parent CIDR range and turns it into a host IP address with
 53 // the given host number.
 54 //             
 55 // For example, 10.3.0.0/16 with a host number of 2 gives 10.3.0.2.
 56 func Host(base *net.IPNet, num int) (net.IP, error) {
...
75   if numUint64.Cmp(maxHostNum) == 1 {
 76     return nil, fmt.Errorf("prefix of %d does not accommodate a host numbered %d", parentLen, num)
 77   }            
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Steps to Reproduce:
Config Part:
~~~~~~~~~~~~~~~~~~
  clusterNetwork:
  - cidr: 192.168.0.0/17
    hostPrefix: 24
  machineCIDR: 10.0.0.0/24
  networkType: OpenShiftSDN
  serviceNetwork:
  - 192.168.128.0/17
~~~~~~~~~~~~~~~~~~~~~~
2. Install Openshift.

Actual results: Unable to proceed with the installation.

Expected results: To be able to use smaller subnet.

Additional info:
Treated in the beginning as RFE532 but reconsidered now as a bug. Please consider it a bug.

Comment 3 David Sanz 2020-08-27 14:36:49 UTC
Verified on 4.6.0-0.nightly-2020-08-27-005538

Comment 5 errata-xmlrpc 2020-10-27 16:30:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.