Created attachment 1030778 [details] pxe boot error Description of problem: My instackenv.json consists of 3 baremetal servers. After I run 'instack-deploy-overcloud --tuskar' one of the nodes gets provisioned and another one gets into 'wait call-back' provision state. Console shows a TFTP file not found error for that node. After some time the 3rd node is used for provisioning and overcloud deployment can continue. Version-Release number of selected component (if applicable): openstack-tripleo-common-0.0.0.post4-1.el7ost.noarch openstack-tripleo-heat-templates-0.8.4-2.el7ost.noarch openstack-tripleo-image-elements-0.9.3-1.el7ost.noarch openstack-tripleo-0.0.5-999.el7ost.noarch openstack-tripleo-puppet-elements-0.0.1.dev55-1.el7ost.noarch openstack-ironic-conductor-2015.1.0-2.el7ost.noarch python-ironicclient-0.5.1-5.el7ost.noarch openstack-ironic-discoverd-1.1.0-1.el7ost.noarch openstack-ironic-common-2015.1.0-2.el7ost.noarch python-ironic-discoverd-1.1.0-1.el7ost.noarch openstack-ironic-api-2015.1.0-2.el7ost.noarch How reproducible: Steps to Reproduce: 1. Install undercloud 2. Register nodes 3. Discover nodes 5. Run instack-deploy-overcloud --tuskar Actual results: Provision fails for one of the nodes. Expected results: Node gets provisioned. Additional info: I deleted the overcloud heat stack / ironic nodes multiple times and always get the same result for the same node. I am attaching the console error that's output when the node is trying to boot.
Created attachment 1031084 [details] ironic.conf Attaching the ironic.conf file.
The reason why it happens is because when Neutron is laying down the DHCP options, the order that it's written to the file may vary, e.g: $ cat /var/lib/neutron/dhcp/8e6c5607-fc9a-4479-a616-cdbfb49019ba/opts tag:f59d3d7e-5cf3-49b3-9a38-6dc3b9887e7c,option:server-ip-address,10.3.58.1 tag:f59d3d7e-5cf3-49b3-9a38-6dc3b9887e7c,option:bootfile-name,http://10.3.58.1:8088/boot.ipxe tag:f59d3d7e-5cf3-49b3-9a38-6dc3b9887e7c,option:tftp-server,10.3.58.1 tag:f59d3d7e-5cf3-49b3-9a38-6dc3b9887e7c,tag:!ipxe,option:bootfile-name,undionly.kpxe tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,option:bootfile-name,http://10.3.58.1:8088/boot.ipxe tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,option:server-ip-address,10.3.58.1 tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,option:tftp-server,10.3.58.1 tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,tag:!ipxe,option:bootfile-name,undionly.kpxe You can see that, we have 2 rules for sending the bootfile to the PXE request: 1) tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,tag:!ipxe,option:bootfile-name,undionly.kpxe You can see that we have an "!pxe" tag there, which basically means: If the request doesn't come from iPXE ACK the DHCP request with the undionly.kpxe file (the "!" in the tag is a negation). So PXE will then chainload into iPXE and send a fresh DHCP request which is now will come from iPXE And then DHCP server should send the iPXE URL (http://10.3.58.1:8088/boot.ipxe) 2) tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,option:bootfile-name,http://10.3.58.1:8088/boot.ipxe But you can see that 2) doesn't explicitly check if the request actually comes from iPXE (no "ipxe" tag) so depending on the order that Neutron lay down this configuration a PXE request can be answered with the 2). This patch[1] is fixing this problem by telling the DHCP server to only ACK with the iPXE url if the request is coming from an iPXE image (by adding a tag). So it should look like: tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,tag:ipxe,option:bootfile-name,http://10.3.58.1:8088/boot.ipxe The patch [1] has been applied to rdo-manager (branches: mgt-master and mgt-kilo). Lemme know if it's now fixed for you. [1] https://github.com/rdo-management/ironic/commit/445132c9152e5ae528c907887b2b943424a9fa55
Deployment went fine multiple times after applying the provided patch. Thanks!
*** Bug 1220933 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1549