Bug 1225621
| Summary: | Provisioning fails for one of the overcloud nodes on baremetal env | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | ||||||
| Component: | rhosp-director | Assignee: | Lucas Alvares Gomes <lmartins> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 7.0 (Kilo) | CC: | calfonso, dmacpher, lmartins, mburns, rhel-osp-director-maint, sasha, sclewis | ||||||
| Target Milestone: | ga | Keywords: | Triaged | ||||||
| Target Release: | Director | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | openstack-ironic-2015.1.0-3.el7ost | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Incorrect ordering of DHCP options the configuration file caused machines to fail on boot. This fix uses tags to the DHCP option to provide the correct ordering. Machine now chainload the boot with the iPXE ROM and then invoking the HTTP URL to continue the boot. This results in a successful boot.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2015-08-05 13:52:05 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1031084 [details]
ironic.conf
Attaching the ironic.conf file.
The reason why it happens is because when Neutron is laying down the DHCP options, the order that it's written to the file may vary, e.g: $ cat /var/lib/neutron/dhcp/8e6c5607-fc9a-4479-a616-cdbfb49019ba/opts tag:f59d3d7e-5cf3-49b3-9a38-6dc3b9887e7c,option:server-ip-address,10.3.58.1 tag:f59d3d7e-5cf3-49b3-9a38-6dc3b9887e7c,option:bootfile-name,http://10.3.58.1:8088/boot.ipxe tag:f59d3d7e-5cf3-49b3-9a38-6dc3b9887e7c,option:tftp-server,10.3.58.1 tag:f59d3d7e-5cf3-49b3-9a38-6dc3b9887e7c,tag:!ipxe,option:bootfile-name,undionly.kpxe tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,option:bootfile-name,http://10.3.58.1:8088/boot.ipxe tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,option:server-ip-address,10.3.58.1 tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,option:tftp-server,10.3.58.1 tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,tag:!ipxe,option:bootfile-name,undionly.kpxe You can see that, we have 2 rules for sending the bootfile to the PXE request: 1) tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,tag:!ipxe,option:bootfile-name,undionly.kpxe You can see that we have an "!pxe" tag there, which basically means: If the request doesn't come from iPXE ACK the DHCP request with the undionly.kpxe file (the "!" in the tag is a negation). So PXE will then chainload into iPXE and send a fresh DHCP request which is now will come from iPXE And then DHCP server should send the iPXE URL (http://10.3.58.1:8088/boot.ipxe) 2) tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,option:bootfile-name,http://10.3.58.1:8088/boot.ipxe But you can see that 2) doesn't explicitly check if the request actually comes from iPXE (no "ipxe" tag) so depending on the order that Neutron lay down this configuration a PXE request can be answered with the 2). This patch[1] is fixing this problem by telling the DHCP server to only ACK with the iPXE url if the request is coming from an iPXE image (by adding a tag). So it should look like: tag:ee56e5a7-9a80-4e1e-82a8-30701aa06b56,tag:ipxe,option:bootfile-name,http://10.3.58.1:8088/boot.ipxe The patch [1] has been applied to rdo-manager (branches: mgt-master and mgt-kilo). Lemme know if it's now fixed for you. [1] https://github.com/rdo-management/ironic/commit/445132c9152e5ae528c907887b2b943424a9fa55 Deployment went fine multiple times after applying the provided patch. Thanks! *** Bug 1220933 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1549 |
Created attachment 1030778 [details] pxe boot error Description of problem: My instackenv.json consists of 3 baremetal servers. After I run 'instack-deploy-overcloud --tuskar' one of the nodes gets provisioned and another one gets into 'wait call-back' provision state. Console shows a TFTP file not found error for that node. After some time the 3rd node is used for provisioning and overcloud deployment can continue. Version-Release number of selected component (if applicable): openstack-tripleo-common-0.0.0.post4-1.el7ost.noarch openstack-tripleo-heat-templates-0.8.4-2.el7ost.noarch openstack-tripleo-image-elements-0.9.3-1.el7ost.noarch openstack-tripleo-0.0.5-999.el7ost.noarch openstack-tripleo-puppet-elements-0.0.1.dev55-1.el7ost.noarch openstack-ironic-conductor-2015.1.0-2.el7ost.noarch python-ironicclient-0.5.1-5.el7ost.noarch openstack-ironic-discoverd-1.1.0-1.el7ost.noarch openstack-ironic-common-2015.1.0-2.el7ost.noarch python-ironic-discoverd-1.1.0-1.el7ost.noarch openstack-ironic-api-2015.1.0-2.el7ost.noarch How reproducible: Steps to Reproduce: 1. Install undercloud 2. Register nodes 3. Discover nodes 5. Run instack-deploy-overcloud --tuskar Actual results: Provision fails for one of the nodes. Expected results: Node gets provisioned. Additional info: I deleted the overcloud heat stack / ironic nodes multiple times and always get the same result for the same node. I am attaching the console error that's output when the node is trying to boot.