Bug 1823359 - Openshift 4.4 Baremetal IPI install fails using external DHCP server on provisioning network
Summary: Openshift 4.4 Baremetal IPI install fails using external DHCP server on provi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 4.5.0
Assignee: Stephen Benjamin
QA Contact: Chad Crum
Victor Voronkov
URL:
Whiteboard:
Depends On:
Blocks: 1826922 dit
TreeView+ depends on / blocked
 
Reported: 2020-04-13 13:08 UTC by Chad Crum
Modified: 2020-07-13 17:27 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1826922 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:27:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 3496 0 None closed Bug 1823359: baremetal: update provisioning CR to quote strings 2020-09-08 10:35:55 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:27:45 UTC

Internal Links: 1826983

Comment 1 Stephen Benjamin 2020-04-13 18:45:31 UTC
Can you attach the dnsmasq configuration? If the control plane succeeds but workers fail, are you sure the next-server is setup correctly for workers? It uses a different IP.

You can also use virtualmedia-based installs, which reduces the complexity of needing PXE configuration in the external DHCP server.

Comment 2 Chad Crum 2020-04-13 19:15:10 UTC
Below is the dnsmasq config I'm using. I'm trying to separate master / worker next-server using tags matched by the node mac address. 


interface=provisioning-0

except-interface=lo
bind-dynamic
#enable-tftp
#tftp-root=/shared/tftpboot

# Disable listening for DNS
port=0
log-dhcp
dhcp-range=172.22.0.10,172.22.0.100

dhcp-option-force=tag:master,66,172.22.0.2
dhcp-option-force=tag:worker,66,172.22.0.3

# Disable default router(s) and DNS over provisioning network
dhcp-option=3
dhcp-option=6

dhcp-host=52:54:00:35:17:a2,set:master
dhcp-host=52:54:00:ad:06:8e,set:master
dhcp-host=52:54:00:87:b2:66,set:master
dhcp-host=52:54:00:06:4a:8e,set:worker
dhcp-host=52:54:00:4d:c0:0d,set:worker

# IPv4 Configuration:
dhcp-match=ipxe,175

# Client is already running iPXE; move to next stage of chainloading
dhcp-boot=tag:master,tag:ipxe,http://172.22.0.2:80/dualboot.ipxe
dhcp-boot=tag:worker,tag:ipxe,http://172.22.0.3:80/dualboot.ipxe

# Note: Need to test EFI booting
dhcp-match=set:efi,option:client-arch,7
dhcp-match=set:efi,option:client-arch,9
dhcp-match=set:efi,option:client-arch,11

# Client is PXE booting over EFI without iPXE ROM; send EFI version of iPXE chainloader
dhcp-boot=tag:master,tag:efi,tag:!ipxe,ipxe.efi,172.22.0.2
dhcp-boot=tag:worker,tag:efi,tag:!ipxe,ipxe.efi,172.22.0.3

# Client is running PXE over BIOS; send BIOS version of iPXE chainloader
dhcp-boot=tag:master,/undionly.kpxe,172.22.0.2
dhcp-boot=tag:worker,/undionly.kpxe,172.22.0.3

Comment 22 Stephen Benjamin 2020-04-22 18:37:57 UTC
Ok so we figured out what was wrong. On the bootstrap server we see this error:


```
Apr 21 19:05:38 localhost bootkube.sh[14955]: "99_baremetal-provisioning-config.yaml": failed to create provisionings.v1alpha1.metal3.io/provisioning-configuration -n : Provisioning.metal3.io "provisioning-configuration" is invalid: spec.provisioningDHCPRange: Invalid value: "null": spec.provisioningDHCPRange in body must be of type string: "null"
```

Here's the Provisioning CR the installer is creating:

```
$ cat provisioning.yaml
apiVersion: metal3.io/v1alpha1
kind: Provisioning
metadata:
  name: provisioning-configuration
spec:
  provisioningInterface: enp4s0
  provisioningIP: 172.22.0.3
  provisioningNetworkCIDR: 172.22.0.0/24
  provisioningDHCPExternal: truWhen provisioningDHCPRange is emptye
  provisioningDHCPRange: 
  provisioningOSDownloadURL: https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.4/44.81.202003110027-0/x86_64/rhcos-44.81.202003110027-0-openstack.x86_64.qcow2.gz?sha256=237b9e0af475bf318abbe8d83d5508c2c3d4cca96fdcdb16edace2cc062216d1
```

We need to set provisioningDHCPRange to "" not null in the installer.

A temporary workaround would be to fix the Provisioning CRD, apply it to the cluster, and make the metal3 pod restart:

   oc scale deployment --replicas=0 metal3 -n openshift-machine-api 


We'll need to get this fixed in the installer.

Comment 29 Stephen Benjamin 2020-05-17 20:46:58 UTC
The two 4.4 BZ's are:
  - https://bugzilla.redhat.com/show_bug.cgi?id=1826922
  - https://bugzilla.redhat.com/show_bug.cgi?id=1829938

The 4.5 BZ's (this one and https://bugzilla.redhat.com/show_bug.cgi?id=1826983) need to be verified before they can get cherry-picked. Did you intend to set yourself as the QA contact on this bug?

Comment 32 errata-xmlrpc 2020-07-13 17:27:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.