Bug 1871048

Summary: Cluster with small block of IP addrresses - range of machineCIDR
Product: OpenShift Container Platform Reporter: David Hernández Fernández <dahernan>
Component: InstallerAssignee: Martin André <m.andre>
Installer sub component: OpenShift on OpenStack QA Contact: David Sanz <dsanzmor>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: elavarde, m.andre, racedoro, ssadhale
Version: 4.5   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: A value of 16000 nodes was set when calculating the end of DHCP allocation pool when provisioning the nodes subnet. Consequence: deploying OpenShift on OpenStack platform with a machine CIDR smaller than /18 resulted in an error. Fix: stop hardcoding the number of nodes and instead dynamically calculate the end of DHCP allocation pool. Result: it is now possible to deploy OpenShift on OpenStack with machine CIDR of any length, provided it is large enough to for all needed nodes.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:30:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1872629    

Description David Hernández Fernández 2020-08-21 08:38:42 UTC
Description of problem: It is assumed that the network should be /18 but would be needed to use /24 network. The intended use case is to have the possibility to use smaller hostnetworks i.e smaller subnet.

The issue appeared during a deployment on OpenStack :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DEBUG module.bootstrap.data.openstack_images_image_v2.bootstrap_image: Refreshing state... 
ERROR                                              
ERROR Error: Error in function call                
ERROR                                              
ERROR   on ../../tmp/openshift-install-273350470/topology/private-network.tf line 31, in resource "openstack_networking_subnet_v2" "nodes": 
ERROR   31:     end   = cidrhost(local.nodes_cidr_block, 16000) 
ERROR     |----------------                        
ERROR     | local.nodes_cidr_block is "10.0.0.0/24" 
ERROR                                              
ERROR Call to function "cidrhost" failed: prefix of 24 does not accommodate a host 
ERROR numbered 16000.                              
ERROR                                              
ERROR Failed to read tfstate: open /tmp/openshift-install-273350470/terraform.tfstate: no such file or directory 
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply using Terraform 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Error come from terraform definitions for OpenStack, it had:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cidr_block = var.machine_cidr
customer had var.machine_cidr is 10.0.0.0/24
then we have file from error block, e.g. ./topology/private-network.tf and there is:
nodes_cidr_block = var.cidr_block
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Later we have allocation_pool that would fit into at least 14 bits per network (16382 hosts), so that's why CIDR has to be /18 or smaller
And there is a comment in code snippet about /18
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # We reserve some space at the beginning of the CIDR to use for the VIPs
  # It would be good to make this more dynamic by calculating the number of
  # addresses in the provided CIDR. This currently assumes at least a /18.
  # FIXME(mandre) if we let the ports pick up VIPs automatically, we don't have
  # to do any of this.
  allocation_pool {
    start = cidrhost(local.nodes_cidr_block, 10)
    end   = cidrhost(local.nodes_cidr_block, 16000)
  }
https://github.com/openshift/installer/blob/release-4.5/data/data/openstack/topology/private-network.tf#L25-L34
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For confirmation we could analyze one more error:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"ERROR Call to function "cidrhost" failed: prefix of 24 does not accommodate a host numbered 16000. "
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is invoked by OCP code, it simply takes 16000th IP, if not available in given CIDR, then failure.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 52 // Host takes a parent CIDR range and turns it into a host IP address with
 53 // the given host number.
 54 //             
 55 // For example, 10.3.0.0/16 with a host number of 2 gives 10.3.0.2.
 56 func Host(base *net.IPNet, num int) (net.IP, error) {
...
75   if numUint64.Cmp(maxHostNum) == 1 {
 76     return nil, fmt.Errorf("prefix of %d does not accommodate a host numbered %d", parentLen, num)
 77   }            
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Steps to Reproduce:
Config Part:
~~~~~~~~~~~~~~~~~~
  clusterNetwork:
  - cidr: 192.168.0.0/17
    hostPrefix: 24
  machineCIDR: 10.0.0.0/24
  networkType: OpenShiftSDN
  serviceNetwork:
  - 192.168.128.0/17
~~~~~~~~~~~~~~~~~~~~~~
2. Install Openshift.

Actual results: Unable to proceed with the installation.

Expected results: To be able to use smaller subnet.

Additional info:
Treated in the beginning as RFE532 but reconsidered now as a bug. Please consider it a bug.

Comment 3 David Sanz 2020-08-27 14:36:49 UTC
Verified on 4.6.0-0.nightly-2020-08-27-005538

Comment 5 errata-xmlrpc 2020-10-27 16:30:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196