Can you attach the dnsmasq configuration? If the control plane succeeds but workers fail, are you sure the next-server is setup correctly for workers? It uses a different IP. You can also use virtualmedia-based installs, which reduces the complexity of needing PXE configuration in the external DHCP server.
Below is the dnsmasq config I'm using. I'm trying to separate master / worker next-server using tags matched by the node mac address. interface=provisioning-0 except-interface=lo bind-dynamic #enable-tftp #tftp-root=/shared/tftpboot # Disable listening for DNS port=0 log-dhcp dhcp-range=172.22.0.10,172.22.0.100 dhcp-option-force=tag:master,66,172.22.0.2 dhcp-option-force=tag:worker,66,172.22.0.3 # Disable default router(s) and DNS over provisioning network dhcp-option=3 dhcp-option=6 dhcp-host=52:54:00:35:17:a2,set:master dhcp-host=52:54:00:ad:06:8e,set:master dhcp-host=52:54:00:87:b2:66,set:master dhcp-host=52:54:00:06:4a:8e,set:worker dhcp-host=52:54:00:4d:c0:0d,set:worker # IPv4 Configuration: dhcp-match=ipxe,175 # Client is already running iPXE; move to next stage of chainloading dhcp-boot=tag:master,tag:ipxe,http://172.22.0.2:80/dualboot.ipxe dhcp-boot=tag:worker,tag:ipxe,http://172.22.0.3:80/dualboot.ipxe # Note: Need to test EFI booting dhcp-match=set:efi,option:client-arch,7 dhcp-match=set:efi,option:client-arch,9 dhcp-match=set:efi,option:client-arch,11 # Client is PXE booting over EFI without iPXE ROM; send EFI version of iPXE chainloader dhcp-boot=tag:master,tag:efi,tag:!ipxe,ipxe.efi,172.22.0.2 dhcp-boot=tag:worker,tag:efi,tag:!ipxe,ipxe.efi,172.22.0.3 # Client is running PXE over BIOS; send BIOS version of iPXE chainloader dhcp-boot=tag:master,/undionly.kpxe,172.22.0.2 dhcp-boot=tag:worker,/undionly.kpxe,172.22.0.3
Ok so we figured out what was wrong. On the bootstrap server we see this error: ``` Apr 21 19:05:38 localhost bootkube.sh[14955]: "99_baremetal-provisioning-config.yaml": failed to create provisionings.v1alpha1.metal3.io/provisioning-configuration -n : Provisioning.metal3.io "provisioning-configuration" is invalid: spec.provisioningDHCPRange: Invalid value: "null": spec.provisioningDHCPRange in body must be of type string: "null" ``` Here's the Provisioning CR the installer is creating: ``` $ cat provisioning.yaml apiVersion: metal3.io/v1alpha1 kind: Provisioning metadata: name: provisioning-configuration spec: provisioningInterface: enp4s0 provisioningIP: 172.22.0.3 provisioningNetworkCIDR: 172.22.0.0/24 provisioningDHCPExternal: truWhen provisioningDHCPRange is emptye provisioningDHCPRange: provisioningOSDownloadURL: https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.4/44.81.202003110027-0/x86_64/rhcos-44.81.202003110027-0-openstack.x86_64.qcow2.gz?sha256=237b9e0af475bf318abbe8d83d5508c2c3d4cca96fdcdb16edace2cc062216d1 ``` We need to set provisioningDHCPRange to "" not null in the installer. A temporary workaround would be to fix the Provisioning CRD, apply it to the cluster, and make the metal3 pod restart: oc scale deployment --replicas=0 metal3 -n openshift-machine-api We'll need to get this fixed in the installer.
The two 4.4 BZ's are: - https://bugzilla.redhat.com/show_bug.cgi?id=1826922 - https://bugzilla.redhat.com/show_bug.cgi?id=1829938 The 4.5 BZ's (this one and https://bugzilla.redhat.com/show_bug.cgi?id=1826983) need to be verified before they can get cherry-picked. Did you intend to set yourself as the QA contact on this bug?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409