Created attachment 1327310 [details] failed boot Description of problem: Customer experienced a pxe boot problem after update from ipxe-bootimgs-20160127-5.git6366fa7a.el7.noarch to ipxe-bootimgs-20170123-1.git4e85b27.el7.noarch and ipxe-bootimgs-20170123-1.git4e85b27.el7_4.1.noarch During RHOSP deployment servers are unable to pxe boot. Version-Release number of selected component (if applicable): ipxe-bootimgs-20170123-1.git4e85b27.el7.noarch and ipxe-bootimgs-20170123-1.git4e85b27.el7_4.1.noarch How reproducible: update from ipxe-bootimgs-20160127-5.git6366fa7a.el7.noarch to ipxe-bootimgs-20170123-1.git4e85b27.el7.noarch and ipxe-bootimgs-20170123-1.git4e85b27.el7_4.1.noarch Actual results: With the new images booting result in an infinite loop with the ipxe file being fetched over and over. Server starts to boot as soon as the new file is overwritten back with the old ipxe.efi in /tftpboot Expected results: Booting process with ipxe-bootimgs-20170123-1 passes to the stage of getting kernel and ramdisk files Additional info:
Created attachment 1327312 [details] Environment details
Miroslav, hi, any ideas why it was rebased and what could cause this?
Hi Dmitry, there were request for ipxe features provided by rebased version - this includes rebase of all subpackages. As for the cause, redirecting to nhorman.
I'm not sure whats being asked for here. Are you asking why this broke the reported environment? I have no idea with the information provided. If we have a simmilar system availble I can try reproduce it.
Hello Neil, The 2017 releases of ipxe-bootimgs seems to be defective. Cust stated that his server started to boot when He restored the old ipxe.efi in /tftpboot. Could you try to check if this is a bug with Intel 82599ES 10-Gigabit Eth controller?
same question from my comment #6 still applies. The provided information tells me nothing about what has gone wrong here. Do we have a tcpdump of the failed exchange and a successful exchange with the working boot rom? Is the customer able to break into the ipxe console via ctrl-b prior to it trying to configure the network interface Do we have a system that can re-create this issue?
Created attachment 1337605 [details] the tcpdump of a failed PXE-boot process The working PXE-boot looks very much like the non-working tcpdump, except that there is no cyclic behavior. And the working ipxe.efi file has a smaller size.
Please answer the questions I had in full. Looking at what you have sent, I notice a few things immediately: 1) I don't see any cyclic behavior. I understand you might expect that because chain booting can lead to cyclic dhcp/tftp operations easily, but this tcpdump shows 3 separate clients attempting to pxe boot. Frame 13 shows a dhcp discover frame from mac address a8:ie:84:3a:80:b7, Frame 1033 shows a dchp discover from mac a8:1e:84:3a:8a:1f, and frame 2065 shows a discover from a8:1e:84:3a:87:8f. Unless the NIC is changing its mac address, these are three separate clients behaving the same way, not the same client repeating itself. If the NIC is changing its MAC address, thats a completely different problem that the hardware vendor is going to need to address 2) The DHCP transaction seems to have some errors in it, namely the malformied option 77 (User Class information). Its actually not fatal, as its derived from a aspect of the dhcp specification that is poorly defined, leading to different implementations formatting it diferently. DHCP servers are smart enough to handle both implementations, but the fact that its the way it is suggests something else below 3) The OUI of each DHCP transaction isn't that of Intel, but rather of Quanta computing. While thats fine, a quick lookup shows that quanta, while they use Intel parts, rebadges them to make them as their own. While that also is fine, it implies they are shipping a non-standard version of the NIC (having reburned the eprom with their own MAC addresses). Additionally to that, the existance of the treating of option 77 as a string rather than a TLV tuple in (2), suggests that they have their own firmware, as direct OEM intel NICS in our lab don't exhibit that behavior. 4) The systems are using UNDI to drive the NICS, meaning that the ipxe code that is downloaded doesn't include the NIC driver, rather ipxe will use the driver embedded in the NIC firmware to initialize the card. (3) and (4) are the big ones in my mind. My guess would be that something in the new ipxe code is attempting to re-initialize the UNDI driver and the driver is failing. An additional question, what version of the NIC preboot firmware is the system running? The latest NIC firmware Quanta claims support for was released in late september: https://downloadcenter.intel.com/download/19186/Ethernet-Intel-Ethernet-Connections-Boot-Utility-Preboot-Images-and-EFI-Drivers I would suggest checking that first.
Hi Neil, Thanks for the dump analysis and all Your comments. Let me share additional explanations from Cust. The NICs on the board do not change the MAC address and the tcpdump captured different servers attempting the PXE-boot. This machine is in a cyclic boot and it does not succeed to boot on any interface. Eventually it will start all over again with the first NIC. They have reproduced the faulty and successfull iPXE boot with the same firmware in the server's NIC. - with ipxe.efi-20160127-5.git6366fa7a.el7 it was OK, while - with ipxe.efi-20170123-1.git4e85b27.el7_4.1 - it was not working. I linked: pxe-bad.cap and pxe-good.cap. Could You take a look on these caps too? They were not able to enter the IPXE command-line prompt by pressing Ctrl-B. The prompt could be seen, but as They press the key combination, the prompt disappears and the boot continues. To verify this card initialization and the embedded driver Cust opened a case also for Quanta. Can We share any details on the changes between these two ipxe.efi versions ?
Created attachment 1339151 [details] pxe-bad.cap
Created attachment 1339156 [details] pxe-good.cap.partaa
Created attachment 1339159 [details] pxe-good.cap.partab
Created attachment 1339161 [details] pxe-good.cap.partac
Created attachment 1339162 [details] pxe-good.cap.partad
Created attachment 1339164 [details] partae
Created attachment 1339165 [details] partaf
Created attachment 1339189 [details] partag
Created attachment 1339192 [details] partah
Created attachment 1339209 [details] partai
Created attachment 1339210 [details] partaj
Created attachment 1339223 [details] partak
Created attachment 1339224 [details] partal
Created attachment 1339228 [details] partam
Created attachment 1339240 [details] partan
Created attachment 1339242 [details] partao
Created attachment 1339243 [details] partap
Created attachment 1339245 [details] partaq
Created attachment 1339247 [details] partar
Created attachment 1339248 [details] partas
Created attachment 1339249 [details] partat MD5 sums 551f8162e96d77ad50f49e00266f8e60 pxe-bad.cap 1a3889bd54f1a4cde0177bdce52aef28 pxe-good.cap 32da570f7586d15222538bcee99b1aa5 pxe-good.cap.partaa 3646223e9275bd2c364c8fbaf98e8fee pxe-good.cap.partab 40896fb4c1aaba1fb5b0df74167653a9 pxe-good.cap.partac d536b58c37fb37cfa5bc48cbbc2209bb pxe-good.cap.partad 53e613c45eb1b68d60162ee0a10b15ea pxe-good.cap.partae e6c7271d6a96a2aad8b7c3ffb905843a pxe-good.cap.partaf 36a78d6edf1a25d844d4bd13dad18b9f pxe-good.cap.partag 6e95fdaba4b599ef7fea1f51db09d43f pxe-good.cap.partah 42abc5cc7441ca798e370488db266001 pxe-good.cap.partai 6c659ab477ab30551c3ba74b72e65da8 pxe-good.cap.partaj b488de6927f808c121500056553390f1 pxe-good.cap.partak d288dd81d2ef40dcb86d927fd65503a7 pxe-good.cap.partal f06b4084476b5eb0790259ac56182647 pxe-good.cap.partam f6b89bd8d4e4450de8a56536ac46e3f2 pxe-good.cap.partan 7300286616f04ebf108f0748fb2e2e49 pxe-good.cap.partao cec7f9489dc0932a8dfdc7276c5feb86 pxe-good.cap.partap 7e9b0630adb5ef21b43b80a0bbc49b40 pxe-good.cap.partaq b95e7a80fb9b4ad51a3d62361733ab43 pxe-good.cap.partar de1fb0a300456da40f1bec85bbb8571d pxe-good.cap.partas 8841d0da41c2db8552650b9b9684333f pxe-good.cap.partat
Sure, you can share the source rpms for both packages, and they are welcome to scan the git logs for various changes. That said, The comparative tcpdumps indicate that way more than just a change in the ipxe firmware is going on here. To cite specifics: Compare frames 1 through 10 in pxe-good.cap with frames 121 through 136 in pxe-bad.cap A Good Vs. Bad comparison: Goood Frame | Bad Frame | Good | Bad 1 | 121 | Requests options 53,57,93 | Requests options | | 94,60,77 (malformed), 55, | 53,57,55,97,94,93,60 | | 175,61,97 | | | | 2 | 125 | Receives offer of | Receives offer of | | 192.0.2.100/24 | 192.0.2.100 | | | 3 | 126 | requests offer from (2) | Requests offer from (2) | | with options 53,57,93,94 | w/ options52,54,50,57 | | 60,77,55,175,61,97,54,50 | 55,97,94,93,60 | | | 4 | 127 | dhcp server acks request | same ACK | | with options 53,54,51 | | | 67,58,59,1,28.3 | | | | 6 | 128 | tftp read request sent to | tftp read request sent | | server with tsize=0 | to server with | | and bsize=1432 | tsize=0 and bsize=1468 | | | 7 | 131 | tftp server responds with | tftp server responds | | ack reporting block size | with ack reporting | | 1432 and tsize 649632 | bsize 1468 and tsize | | (ipxe file size) | 715584 (ipxe file size) | | | - | 132-134 | | Client aborts transfer | | | citing negotiation fail | | | restarts with same | | | options, but leaves | | | out tsize. Server Acks | | | also leaving out tsize | | | and client accepts | | | 9 | 136 | First data block | First data block So, there are clearly some large discrepancies here. The dhcp client configuration between the two boots is clearly significantly different, which suggests that the client is using two uefi boot targets that are not configured in the same manner. and that should defintely first be rectified if we are to consider that something might be wrong with the ipxe firmware. More significatnly however is the tftp configuration. For some reason, during the bad request, the client asks for the ipxe.efi file with a different set of options (most notably the block size). The larger block size makes the resultant udp data frames in the tftp 1514 bytes long, which is the maximum ethernet frame size. That should be fine, but given that its udp may lead to a higher likelyhood of frame coruption. Additionally, its interesting that the client failed the tftp negotiation when the tsize was set to 0 (which informs the server that it should report the file size being requested). Only when it dropped the tsize option was the negotiation completed. Part of me wonders if perhaps the Quantaco firmware has been modified to limit the size of the file it can download (due to some other corruption for larger files). That would explain why the larger ipxe file fails to work (if was only partially downloaded or bits received past a certain threshold are corrupt). Not including the tsize option may be a programming error that allows the download to start, and silently fail during execution, rather than fail prior to the attempt. I would suggest that you and the customer: 1) Reconcile the client configs so that the dpcp phase of boot look identical between the two attempts 2) contact quataco and investigate why the tftp operation executes the way it does on the larger file request. I expect with sufficient pressure, you will find that the NIC in question has a bug with larger ipxe images.
Hi Neil, Here is the Customer theory on how this could happen. The ONLY thing that was changed between these two DHCP requests was the ipxe.efi file manually swapped on the tftp server (Director). The sequence of events: 1. The /tftpboot/ipxe.efi file is the 2017 version ("bad"). 2. Server is powered up and sends the DHCP with some set of request options ... What is determining the set of DHCP options that will be found in this request? Cust theory is that it is determined by the version of the ipxe.efi file that was used the last time. 3. The PXE boot fails... for whatever reasons, - They have little visibility on this, relying on Quanta for their investigation 4. They manually swap the /tftpboot/ipxe.efi on the tftp boot (Director) server to the 2016 ("good") version 5. While monitoring the server console, They can see that the the next PXE boot attempt reflects the new ("good") file size, however, the PXE boot still fails because the NIC was previously "programmed" by the "bad" ipxe.efi file and the "bad" set of the DHCP request options is used at this time. However, the "good" ipxe.efi file is consumed by the NIC, and it will try to PXE boot again. 6. The next PXE boot is working fine, because at this point the NIC has been already "programmed" by the "good" ipxe.efi file and is using the "good" set of DHCP request options, which leads to successful PXE boot and loading of subsequent ramdisk and other images. For me it seems to be logical - but I am not an expert. What do you think of this explanation?
No. What you are describing is a case of seeing hoofprints and thinking there are zebras in the area, rather than horses. That theory makes sense only if Quanta has redefined dhcp's specified behavior, which (were it true) would be its own very serious problem. DHCP is a stateless protocol from the client standpoint, and relies on no outside information to determine what dhcp options to request. The fact that the two dchp exchanges between good and bad differ is solely due to discrepancies in boot target configuration on the client, not because a prior dhcp exchange somehow reconfigured the adapter firmware Allowing prior dhcp exchanges to reconfigure your clients would be a monumental security risk, as a rogue or malicious dhcp server could corrupt your entire install base, and would reflect a gross error on the part of the quanta firmware authors. this is not a bizarre behavior of ipxe, this is a pedestrian problem with a NICs firmware. Please investigate points 1 & 2 of comment comment 35 as I requested.
Hello Neil, Customer was able to introspect the servers by changing setting in the /etc/ironic-inspector/dnsmasq.conf dhcp-boot=tag:efi,ipxe.efi to dhcp-boot=tag:efi,tag:!ipxe,ipxe.efi The fix was introduced by another BZ #1479386 to puppet-ironic-9.5.0-2.el7ost.noarch. It solved the issue for Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Adapter. Ericsson, however, hit the same problem with Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) NICs. The older 2016 version of ipxe.efi image is working, but the loading progress is very slow - 30 minutes at least. -- after loading the 2016 ipxe.efi image the x710 NIC is sending the DHCP Discover ( packet # 932 in file X710-ipxe-2016.cap ) with Option 93 Client System Architecture: EFI BC (7) Option 60 Vendor Class Identifier: PXEClient:Arch:00007:UNDI:003010 and this causes the Director to offer to load the http://192.0.2.1:8088/inspector.ipxe then the introspection completes -- after loading the 2017 ipxe.efi image, the x710 NIC is sending the DHCP Discover ( packet # 1018 in file X710-ipxe-2017.cap ) with Option 93 Client System Architecture: EFI x86-64 (9) Option 60 Vendor Class Identifier: PXEClient:Arch:00009:UNDI:003010 and this causes the Director to offer to load the ipxe.efi file again, then PXE boot is looping. Could you examine this case please? I will attach dumps soon.
Created attachment 1359922 [details] X710-ipxe-2017
Created attachment 1359923 [details] X710-ipxe-2016.cap.partaa
Created attachment 1359925 [details] X710-ipxe-2016.cap.partab
Some technical detail regarding the NICs: [root@overcloud-controller-0 heat-admin]# lspci | grep X710 83:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 83:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 83:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 83:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) [root@overcloud-controller-0 heat-admin]# ethtool -i ens6f0 driver: i40e version: 1.6.27-k firmware-version: 4.53 0x80001fad 0.0.0 expansion-rom-version: bus-info: 0000:83:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes Is there any known issue with x710 using the 2017 ipxe ?
Ok, Well, I don't know what to tell you then. I'm a bit dumbfounded that there were willing to speculate and investigate so much early on, but are now unable to work on this issue, but I can't pretend to know their business needs. As such, please re-open if they are able to pick this back up later.