Bug 1301694
Summary: | pxe boot timed on baremetal nodes during overcloud introspection | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Udi Shkalim <ushkalim> | ||||
Component: | ipxe | Assignee: | Dmitry Tantsur <dtantsur> | ||||
Status: | CLOSED WONTFIX | QA Contact: | yeylon <yeylon> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 7.0 (Kilo) | CC: | ahirshbe, alex.williamson, apevec, athomas, fhubik, jcoufal, jen, kbasil, lersek, lhh, mburns, mcornea, mrezanin, oblaut, sasha, srevivo, ushkalim, yeylon | ||||
Target Milestone: | y3 | Keywords: | Reopened | ||||
Target Release: | 7.0 (Kilo) | ||||||
Hardware: | All | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1308611 (view as bug list) | Environment: | |||||
Last Closed: | 2016-02-15 15:37:23 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1308611 | ||||||
Attachments: |
|
Description
Udi Shkalim
2016-01-25 17:37:23 UTC
Created attachment 1118144 [details]
var log dir
Reproduced the issue and the workaround (http://boot.ipxe.org/undionly.kpxe) on BM. The issue doesn't reproduce on virtual setup. another work around - http://etherpad.corp.redhat.com/ironic-ipxe-to-pxe Hi, can you please elaborate? Namely, for non-virt purposes, two packages are built from the ipxe SRPM: ipxe-roms, and ipxe-bootimgs. The former (= ipxe-roms) is what actually contains PCI expansion ROMs, which are meant as *replacements* for the PCI expansion ROMs that are already burned into physical NICs. See: http://ipxe.org/howto/romburning Whereas the latter (= ipxe-bootimgs) contains standalone iPXE images that can be booted / bootstrapped with various *existing* boot mechanisms: USB, CD-ROM, or the preexistent PXE boot capability (= factory installed PCI expansion ROM) of your NIC. See: http://ipxe.org/howto/chainloading In light of the above, the bug report confuses me: (a) It references "ipxe-bootimgs", and it states that replacing "undionly.kpxe" on the TFTP server (which file indeed comes from "ipxe-bootimgs") with a fresh upstream binary fixes things. These points consistently imply that there's a problem with "undionly.kpxe" from the ipxe-bootimgs package. Also they imply that there is no intent to reflash physical NICs with ROM files retrieved from "ipxe-roms". (b) However, the comments also imply that *downloading* "undionly.kpxe" from the TFTP server runs into issues now. I don't understand how that's possible, since in that phase the only relevance the ipxe rebase may have is the changed *size* of the file being downloaded ("undionly.kpxe"). Since the same factory-installed PCI oprom of the physical NIC is used for this download as before, I don't see how the ipxe rebase can have any effect here. Especially this comment: "Eliminating networking we found that the iPXE ROM is having trouble" is hard to understand: - If you fully eliminate the network, you can't even download "undionly.kpxe" via TFTP. - If you keep the local subnet alive (so that TFTP works and "undionly.kpxe" is downloaded successfully), but prevent "undionly.kpxe" from loading further stuff (e.g., via HTTP), then the statement "iPXE ROM is having trouble" is hard to interpret: - The NIC's oprom obviously managed to load "undionly.kpxe", so it is not having trouble (and that ROM doesn't even originate from iPXE), - "undionly.kpxe", which could have trouble, is *not a ROM*. Anyway, assuming this is a network driver issue in iPXE, and because comment 0 named Intel 82576, and because fresh upstream iPXE works, we can look for upstream commits our latest rebase lacks: $ git log --oneline --reverse 4e03af8e..master -- src/drivers/net/intel.c d5f7ee6 [intel] Add PCI IDs for i210/i211 flashless operation fff9281 [intel] Forcibly skip PHY reset on some models d694592 [intel] Add INTEL_NO_PHY_RST for I217-LM My guess is either fff9281 or d694592. (The bug report doesn't contain exact vendor ID / device ID, so it's just a guess.) In attachment 1118144 [details] I found the "dmesg" file. It says: [ 1.090871] pci 0000:05:00.1: [8086:10c9] type 00 class 0x020000 Searching the iPXE source for 10c9, it is found in "src/drivers/net/intel.c", but it is not affected by the commits listed in comment 8: src/drivers/net/intel.c: PCI_ROM ( 0x8086, 0x10c9, "82576", "82576", 0 ), (It doesn't have the INTEL_NO_PHY_RST flag.) So I have to think this is not a NIC driver issue in iPXE; probably something more generic. 7.3 Installtion from the 29 Jan is having the latest ROM from http://boot.ipxe.org/undionly.kpxe cksum /usr/share/ipxe/undionly.kpxe 3260852374 64047 /usr/share/ipxe/undionly.kpxe Documentation on failing back to PXE is drafted as a knowledgebase article. (In reply to Udi Shkalim from comment #18) > 7.3 Installtion from the 29 Jan is having the latest ROM from > http://boot.ipxe.org/undionly.kpxe > > cksum /usr/share/ipxe/undionly.kpxe > 3260852374 64047 /usr/share/ipxe/undionly.kpxe Please ignore the above comment. I used a borrowed setup. I'm currently re-testing with the package from brew https://brewweb.devel.redhat.com/taskinfo?taskID=10401510 Laszlo, Can you please regenerate the rpm in brew? it's empty and there is no other source. Thanks. Hey Miroslav, I used your repos to update IPXE to: ipxe-bootimgs.noarch 0:20150821-1.git4e03af8e.el7.test But the deployment failed and checked the cksum of undionly.kpxe under my /tftpboot against http://boot.ipxe.org/undionly.kpxe and saw that they are different, after I replaced the files the deployment pass the ironic phase. [root@puma33 ~]# cksum undionly.kpxe 1521140302 64074 undionly.kpxe [root@puma33 ~]# cksum /tftpboot/undionly.kpxe 750298637 63517 /tftpboot/undionly.kpxe Hi Asaf, is it possible to have access to your setup to test? In case not can you test with with newer version of ipxe in batcave repo (should be ipxe-20160127-0.git6366fa7a.el7)? Mirek This bug has been addressed by a combination of a new KB article which describes the process of switching to PXE for users whose hardware doesn't work with iPXE, and by the shipping of an updated iPXE ROM, as tracked here: https://bugzilla.redhat.com/show_bug.cgi?id=1267030 Hi Angus In Reply to comment 25 We still have ipxe-bootimgs.noarch 0:20150821-1.git4e03af8e.el7.test This fails our installations bug https://bugzilla.redhat.com/show_bug.cgi?id=1267030 Fixed In Version: is ipxe-20150821-1.git4e03af8e.el7 Which still fail the installation from time to time Ofer Workaround for 7.3 is documented, we will take the new iPXE when it is available and fixed (probably in OSP8). Closing Cloned for OSP8 for tracking purposes: https://bugzilla.redhat.com/show_bug.cgi?id=1308611 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |