Bug 1813422

Summary: vague error message when machineCIDR and provisioningNetworkCIDR values are the same in install-config.yaml
Product: OpenShift Container Platform Reporter: Alexander Chuzhoy <sasha>
Component: InstallerAssignee: Stephen Benjamin <stbenjam>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Amit Ugol <augol>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: rbartal, smiron
Version: 4.4Keywords: Triaged
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: No explicit check for an overlap between machineCIDR and provisioningNetworkCIDR. Consequence: Unclear error message when the networks overlap. Fix: Introduce an explicit check for an overlap. Result: Users now get a clear error message about what went wrong.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:20:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Overlapping message between provisioningNetworkCIDR and machineCIDR none

Description Alexander Chuzhoy 2020-03-13 18:46:26 UTC
Version:4.4.0-0.nightly-2020-03-13-073111

Attempted to deploy with having the same machineCIDR and provisioningNetworkCIDR in install-config.yaml (below):
apiVersion: v1
baseDomain: 2qe.lab.redhat.com
networking:
  networkType: OpenShiftSDN
  machineCIDR: 192.168.124.0/24
metadata:
  name: ocp-edge-cluster
compute:
- name: worker
  replicas: 2
controlPlane:
  name: master
  replicas: 3
  platform:
    baremetal: {}
platform:
  baremetal:
    libvirtURI: qemu+ssh://kni.qe.lab.redhat.com/system
    provisioningNetworkInterface: enp4s0
    provisioningNetworkCIDR: 192.168.124.0/24



The deployment failed without proper indication:



time="2020-03-13T18:09:56Z" level=info msg="all files found, ready to proceed" installID=x48l7fts
time="2020-03-13T18:09:56Z" level=debug msg="OpenShift Installer v4.4.0"
time="2020-03-13T18:09:56Z" level=debug msg="Built from commit e1b323fd7bbb57cabcbded74dad08483390f9a6c"
time="2020-03-13T18:09:56Z" level=debug msg="Fetching Master Machines..."
time="2020-03-13T18:09:56Z" level=debug msg="Loading Master Machines..."
time="2020-03-13T18:09:56Z" level=debug msg="  Loading Cluster ID..."
time="2020-03-13T18:09:56Z" level=debug msg="    Loading Install Config..."
time="2020-03-13T18:09:56Z" level=debug msg="      Loading SSH Key..."
time="2020-03-13T18:09:56Z" level=debug msg="      Loading Base Domain..."
time="2020-03-13T18:09:56Z" level=debug msg="        Loading Platform..."
time="2020-03-13T18:09:56Z" level=debug msg="      Loading Cluster Name..."
time="2020-03-13T18:09:56Z" level=debug msg="        Loading Base Domain..."
time="2020-03-13T18:09:56Z" level=debug msg="        Loading Platform..."
time="2020-03-13T18:09:56Z" level=debug msg="      Loading Pull Secret..."
time="2020-03-13T18:09:56Z" level=debug msg="      Loading Platform..."
time="2020-03-13T18:09:57Z" level=fatal msg="failed to fetch Master Machines: failed to load asset \"Install Config\": invalid \"install-config.yaml\" file: [platform.baremetal.provisioningHostIP: Invalid value: \"192.168.124.3\": the IP must not be in one of the machine networks, platform.baremetal.bootstrapHostIP: Invalid value: \"192.168.124.2\": the IP must not be in one of the machine networks]"
time="2020-03-13T18:09:58Z" level=error msg="error after waiting for command completion" error="exit status 1" installID=x48l7fts
time="2020-03-13T18:09:58Z" level=error msg="error generating installer assets" error="exit status 1" installID=x48l7fts
time="2020-03-13T18:09:58Z" level=info msg="reading installer log" installID=x48l7fts
time="2020-03-13T18:09:58Z" level=info msg="saving installer output" installID=x48l7fts
time="2020-03-13T18:09:58Z" level=debug msg="installer console log: level=fatal msg=\"failed to fetch Master Machines: failed to load asset \\\"Install Config\\\": invalid \\\"install-config.yaml\\\" file: [platform.baremetal.provisioningHostIP: Invalid value: \\\"192.168.124.3\\\": the IP must not be in one of the machine networks, platform.baremetal.bootstrapHostIP: Invalid value: \\\"192.168.124.2\\\": the IP must not be in one of the machine networks]\"\n" installID=x48l7fts
time="2020-03-13T18:09:58Z" level=info msg="updating clusterprovision" installID=x48l7fts
time="2020-03-13T18:09:58Z" level=fatal msg="runtime error" error="exit status 1"
[kni@provisionhost-0 ~]$


We need to better handle this kind of error.

Comment 1 Stephen Benjamin 2020-03-23 17:26:17 UTC
Indeed, we should error on the provisioningNetworkCIDR field itself. Thanks for the report.

The two networks indeed can't be the same or overlap at all.

Comment 3 Scott Dodson 2020-04-07 00:45:59 UTC
PR is against master, setting TR 4.5.0 and medium/medium.

Comment 7 Shelly Miron 2020-04-22 10:33:55 UTC
Verified.
The Bug tested in OCP version 4.5 with ipv4 using 2 workers.

Steps:

    1.edit install-config.yaml file and set up:
             
               machineCIDR: 192.168.123.0/24
           
               ....
               
               provisioningNetworkInterface: enp4s0
               provisioningNetworkCIDR: 192.168.123.0/24


    2. run command: cp install-config.yaml ~/ocp
    3. run command: ./openshift-baremetal-install --dir ~/ocp create manifests


    Result: Received a message that indicate the overlap between provisioningNetworkCIDR and machineCIDR : 


             FATAL failed to fetch Master Machines: failed to load asset "Install Config": invalid "install-config.yaml" file: [platform.baremetal.provisioningNetworkCIDR: Invalid value: "192.168.123.0/24": 
             cannot overlap with machine network: 192.168.123.0/24 overlaps with 192.168.123.0/24, platform.baremetal.provisioningHostIP: Invalid value: "192.168.123.3": the IP must not be in one of the 
             machine networks, platform.baremetal.bootstrapHostIP: Invalid value: "192.168.123.2": the IP must not be in one of the machine networks]

Note: there are messages that come up as a projection to the overlap, it may be best if the error stop when the overlap happened, and will not drag other errors.

(Image attached to this message)

Comment 8 Shelly Miron 2020-04-22 10:37:12 UTC
Created attachment 1680803 [details]
Overlapping message between provisioningNetworkCIDR and machineCIDR

Comment 10 errata-xmlrpc 2020-07-13 17:20:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409