Bug 1813422 - vague error message when machineCIDR and provisioningNetworkCIDR values are the same in install-config.yaml
Summary: vague error message when machineCIDR and provisioningNetworkCIDR values are ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Stephen Benjamin
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-13 18:46 UTC by Alexander Chuzhoy
Modified: 2020-07-13 17:20 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: No explicit check for an overlap between machineCIDR and provisioningNetworkCIDR. Consequence: Unclear error message when the networks overlap. Fix: Introduce an explicit check for an overlap. Result: Users now get a clear error message about what went wrong.
Clone Of:
Environment:
Last Closed: 2020-07-13 17:20:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Overlapping message between provisioningNetworkCIDR and machineCIDR (36.31 KB, image/png)
2020-04-22 10:37 UTC, Shelly Miron
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 3358 0 None closed Bug 1813422: baremetal: validate no overlap between provisioning and machine nets 2020-11-24 22:30:48 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:20:31 UTC

Description Alexander Chuzhoy 2020-03-13 18:46:26 UTC
Version:4.4.0-0.nightly-2020-03-13-073111

Attempted to deploy with having the same machineCIDR and provisioningNetworkCIDR in install-config.yaml (below):
apiVersion: v1
baseDomain: 2qe.lab.redhat.com
networking:
  networkType: OpenShiftSDN
  machineCIDR: 192.168.124.0/24
metadata:
  name: ocp-edge-cluster
compute:
- name: worker
  replicas: 2
controlPlane:
  name: master
  replicas: 3
  platform:
    baremetal: {}
platform:
  baremetal:
    libvirtURI: qemu+ssh://kni.qe.lab.redhat.com/system
    provisioningNetworkInterface: enp4s0
    provisioningNetworkCIDR: 192.168.124.0/24



The deployment failed without proper indication:



time="2020-03-13T18:09:56Z" level=info msg="all files found, ready to proceed" installID=x48l7fts
time="2020-03-13T18:09:56Z" level=debug msg="OpenShift Installer v4.4.0"
time="2020-03-13T18:09:56Z" level=debug msg="Built from commit e1b323fd7bbb57cabcbded74dad08483390f9a6c"
time="2020-03-13T18:09:56Z" level=debug msg="Fetching Master Machines..."
time="2020-03-13T18:09:56Z" level=debug msg="Loading Master Machines..."
time="2020-03-13T18:09:56Z" level=debug msg="  Loading Cluster ID..."
time="2020-03-13T18:09:56Z" level=debug msg="    Loading Install Config..."
time="2020-03-13T18:09:56Z" level=debug msg="      Loading SSH Key..."
time="2020-03-13T18:09:56Z" level=debug msg="      Loading Base Domain..."
time="2020-03-13T18:09:56Z" level=debug msg="        Loading Platform..."
time="2020-03-13T18:09:56Z" level=debug msg="      Loading Cluster Name..."
time="2020-03-13T18:09:56Z" level=debug msg="        Loading Base Domain..."
time="2020-03-13T18:09:56Z" level=debug msg="        Loading Platform..."
time="2020-03-13T18:09:56Z" level=debug msg="      Loading Pull Secret..."
time="2020-03-13T18:09:56Z" level=debug msg="      Loading Platform..."
time="2020-03-13T18:09:57Z" level=fatal msg="failed to fetch Master Machines: failed to load asset \"Install Config\": invalid \"install-config.yaml\" file: [platform.baremetal.provisioningHostIP: Invalid value: \"192.168.124.3\": the IP must not be in one of the machine networks, platform.baremetal.bootstrapHostIP: Invalid value: \"192.168.124.2\": the IP must not be in one of the machine networks]"
time="2020-03-13T18:09:58Z" level=error msg="error after waiting for command completion" error="exit status 1" installID=x48l7fts
time="2020-03-13T18:09:58Z" level=error msg="error generating installer assets" error="exit status 1" installID=x48l7fts
time="2020-03-13T18:09:58Z" level=info msg="reading installer log" installID=x48l7fts
time="2020-03-13T18:09:58Z" level=info msg="saving installer output" installID=x48l7fts
time="2020-03-13T18:09:58Z" level=debug msg="installer console log: level=fatal msg=\"failed to fetch Master Machines: failed to load asset \\\"Install Config\\\": invalid \\\"install-config.yaml\\\" file: [platform.baremetal.provisioningHostIP: Invalid value: \\\"192.168.124.3\\\": the IP must not be in one of the machine networks, platform.baremetal.bootstrapHostIP: Invalid value: \\\"192.168.124.2\\\": the IP must not be in one of the machine networks]\"\n" installID=x48l7fts
time="2020-03-13T18:09:58Z" level=info msg="updating clusterprovision" installID=x48l7fts
time="2020-03-13T18:09:58Z" level=fatal msg="runtime error" error="exit status 1"
[kni@provisionhost-0 ~]$


We need to better handle this kind of error.

Comment 1 Stephen Benjamin 2020-03-23 17:26:17 UTC
Indeed, we should error on the provisioningNetworkCIDR field itself. Thanks for the report.

The two networks indeed can't be the same or overlap at all.

Comment 3 Scott Dodson 2020-04-07 00:45:59 UTC
PR is against master, setting TR 4.5.0 and medium/medium.

Comment 7 Shelly Miron 2020-04-22 10:33:55 UTC
Verified.
The Bug tested in OCP version 4.5 with ipv4 using 2 workers.

Steps:

    1.edit install-config.yaml file and set up:
             
               machineCIDR: 192.168.123.0/24
           
               ....
               
               provisioningNetworkInterface: enp4s0
               provisioningNetworkCIDR: 192.168.123.0/24


    2. run command: cp install-config.yaml ~/ocp
    3. run command: ./openshift-baremetal-install --dir ~/ocp create manifests


    Result: Received a message that indicate the overlap between provisioningNetworkCIDR and machineCIDR : 


             FATAL failed to fetch Master Machines: failed to load asset "Install Config": invalid "install-config.yaml" file: [platform.baremetal.provisioningNetworkCIDR: Invalid value: "192.168.123.0/24": 
             cannot overlap with machine network: 192.168.123.0/24 overlaps with 192.168.123.0/24, platform.baremetal.provisioningHostIP: Invalid value: "192.168.123.3": the IP must not be in one of the 
             machine networks, platform.baremetal.bootstrapHostIP: Invalid value: "192.168.123.2": the IP must not be in one of the machine networks]

Note: there are messages that come up as a projection to the overlap, it may be best if the error stop when the overlap happened, and will not drag other errors.

(Image attached to this message)

Comment 8 Shelly Miron 2020-04-22 10:37:12 UTC
Created attachment 1680803 [details]
Overlapping message between provisioningNetworkCIDR and machineCIDR

Comment 10 errata-xmlrpc 2020-07-13 17:20:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.