Bug 1492620 - 2017 releases of ipxe-bootimgs are broken
Summary: 2017 releases of ipxe-bootimgs are broken
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ipxe
Version: 7.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 7.3
Assignee: Neil Horman
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks: 1396389
TreeView+ depends on / blocked
 
Reported: 2017-09-18 10:33 UTC by Dariusz Wojewódzki
Modified: 2021-03-11 15:47 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Base Board Information Manufacturer: Quanta Computer Inc. Product Name: S2BS-MB Version: 31S2BMB0090 Serial Number: QTFMQK65200366 Asset Tag: Features: Board is a hosting board Board is replaceable Location In Chassis: Default string Chassis Handle: 0x0003 Type: Motherboard Contained Object Handles: 0 BIOS Information Vendor: American Megatrends Inc. Version: S2B_3B10.01 Release Date: 03/21/2017 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 8192 kB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) ACPI is supported USB legacy is supported BIOS boot specification is supported Targeted content distribution is supported UEFI is supported BIOS Revision: 5.11 Firmware Revision: 3.50 # lspci | grep -i eth 01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 03:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 03:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 03:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 03:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 05:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 05:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) [root@vhost ~]# ethtool -i ens3f0 driver: ixgbe version: 4.4.0-k-rh7.3 firmware-version: 0x800004e0 expansion-rom-version: bus-info: 0000:05:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no ---------------------------------------------------------------- [root@director log]# grep ipxe-bootimg yum.log Jun 02 11:20:14 Installed: ipxe-bootimgs-20160127-5.git6366fa7a.el7.noarch Aug 14 14:58:36 Updated: ipxe-bootimgs-20170123-1.git4e85b27.el7.noarch Sep 06 10:27:21 Updated: ipxe-bootimgs-20170123-1.git4e85b27.el7_4.1.noarch ----------------------------------------------------------------
Last Closed: 2018-02-06 14:03:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
failed boot (33.80 KB, image/jpeg)
2017-09-18 10:33 UTC, Dariusz Wojewódzki
no flags Details
Environment details (6.69 KB, text/plain)
2017-09-18 10:35 UTC, Dariusz Wojewódzki
no flags Details
the tcpdump of a failed PXE-boot process (2.25 MB, application/octet-stream)
2017-10-12 08:30 UTC, Dariusz Wojewódzki
no flags Details
pxe-bad.cap (5.27 MB, application/octet-stream)
2017-10-16 10:10 UTC, Dariusz Wojewódzki
no flags Details
pxe-good.cap.partaa (19.04 MB, application/octet-stream)
2017-10-16 10:32 UTC, Dariusz Wojewódzki
no flags Details
pxe-good.cap.partab (19.04 MB, application/octet-stream)
2017-10-16 10:38 UTC, Dariusz Wojewódzki
no flags Details
pxe-good.cap.partac (19.04 MB, application/octet-stream)
2017-10-16 10:42 UTC, Dariusz Wojewódzki
no flags Details
pxe-good.cap.partad (19.04 MB, application/octet-stream)
2017-10-16 10:50 UTC, Dariusz Wojewódzki
no flags Details
partae (19.04 MB, application/octet-stream)
2017-10-16 11:17 UTC, Dariusz Wojewódzki
no flags Details
partaf (19.04 MB, application/octet-stream)
2017-10-16 11:22 UTC, Dariusz Wojewódzki
no flags Details
partag (19.04 MB, application/octet-stream)
2017-10-16 11:53 UTC, Dariusz Wojewódzki
no flags Details
partah (19.04 MB, application/octet-stream)
2017-10-16 12:02 UTC, Dariusz Wojewódzki
no flags Details
partai (19.04 MB, application/octet-stream)
2017-10-16 12:50 UTC, Dariusz Wojewódzki
no flags Details
partaj (19.04 MB, application/octet-stream)
2017-10-16 12:58 UTC, Dariusz Wojewódzki
no flags Details
partak (19.04 MB, application/octet-stream)
2017-10-16 13:09 UTC, Dariusz Wojewódzki
no flags Details
partal (19.04 MB, application/octet-stream)
2017-10-16 13:13 UTC, Dariusz Wojewódzki
no flags Details
partam (19.04 MB, application/octet-stream)
2017-10-16 13:25 UTC, Dariusz Wojewódzki
no flags Details
partan (19.04 MB, application/octet-stream)
2017-10-16 13:32 UTC, Dariusz Wojewódzki
no flags Details
partao (19.04 MB, application/octet-stream)
2017-10-16 13:39 UTC, Dariusz Wojewódzki
no flags Details
partap (19.04 MB, application/octet-stream)
2017-10-16 13:46 UTC, Dariusz Wojewódzki
no flags Details
partaq (19.04 MB, application/octet-stream)
2017-10-16 13:56 UTC, Dariusz Wojewódzki
no flags Details
partar (19.04 MB, application/octet-stream)
2017-10-16 14:00 UTC, Dariusz Wojewódzki
no flags Details
partas (19.04 MB, application/octet-stream)
2017-10-16 14:06 UTC, Dariusz Wojewódzki
no flags Details
partat (8.51 MB, application/octet-stream)
2017-10-16 14:11 UTC, Dariusz Wojewódzki
no flags Details
X710-ipxe-2017 (3.76 MB, application/octet-stream)
2017-11-28 14:46 UTC, Dariusz Wojewódzki
no flags Details
X710-ipxe-2016.cap.partaa (19.00 MB, application/octet-stream)
2017-11-28 14:55 UTC, Dariusz Wojewódzki
no flags Details
X710-ipxe-2016.cap.partab (9.81 MB, application/octet-stream)
2017-11-28 14:57 UTC, Dariusz Wojewódzki
no flags Details

Description Dariusz Wojewódzki 2017-09-18 10:33:06 UTC
Created attachment 1327310 [details]
failed boot

Description of problem:

Customer experienced a pxe boot problem after update from

ipxe-bootimgs-20160127-5.git6366fa7a.el7.noarch
to
ipxe-bootimgs-20170123-1.git4e85b27.el7.noarch and
ipxe-bootimgs-20170123-1.git4e85b27.el7_4.1.noarch


During RHOSP deployment servers are unable to pxe boot. 

Version-Release number of selected component (if applicable):
ipxe-bootimgs-20170123-1.git4e85b27.el7.noarch and
ipxe-bootimgs-20170123-1.git4e85b27.el7_4.1.noarch

How reproducible:
update from ipxe-bootimgs-20160127-5.git6366fa7a.el7.noarch
to
ipxe-bootimgs-20170123-1.git4e85b27.el7.noarch and
ipxe-bootimgs-20170123-1.git4e85b27.el7_4.1.noarch


Actual results:
With the new images booting result in an infinite loop with the ipxe file being fetched over and over.

Server starts to boot as soon as the new file is overwritten back with the old ipxe.efi in /tftpboot


Expected results:
Booting process with ipxe-bootimgs-20170123-1 passes to the stage of getting kernel and ramdisk files 

Additional info:

Comment 2 Dariusz Wojewódzki 2017-09-18 10:35:34 UTC
Created attachment 1327312 [details]
Environment details

Comment 4 Dmitry Tantsur 2017-10-02 10:24:32 UTC
Miroslav, hi, any ideas why it was rebased and what could cause this?

Comment 5 Miroslav Rezanina 2017-10-02 10:31:48 UTC
Hi Dmitry,

there were request for ipxe features provided by rebased version - this includes rebase of all subpackages.

As for the cause, redirecting to nhorman.

Comment 6 Neil Horman 2017-10-02 11:49:07 UTC
I'm not sure whats being asked for here.  Are you asking why this broke the reported environment?  I have no idea with the information provided.  If we have a simmilar system availble I can try reproduce it.

Comment 7 Dariusz Wojewódzki 2017-10-02 12:46:13 UTC
Hello Neil,
The 2017 releases of ipxe-bootimgs seems to be defective.

Cust stated that his server started to boot when He restored the old ipxe.efi in /tftpboot. 

Could you try to check if this is a bug with Intel 82599ES 10-Gigabit Eth controller?

Comment 10 Neil Horman 2017-10-11 15:18:43 UTC
same question from my comment #6 still applies.  The provided information tells me nothing about what has gone wrong here.

Do we have a tcpdump of the failed exchange and a successful exchange with the working boot rom?

Is the customer able to break into the ipxe console via ctrl-b prior to it trying to configure the network interface

Do we have a system that can re-create this issue?

Comment 11 Dariusz Wojewódzki 2017-10-12 08:30:45 UTC
Created attachment 1337605 [details]
the tcpdump of a failed PXE-boot process

The working  PXE-boot  looks very much like the non-working tcpdump, except that there is no cyclic behavior.
And the working ipxe.efi file has a smaller size.

Comment 12 Neil Horman 2017-10-12 11:56:24 UTC
Please answer the questions I had in full.  

Looking at what you have sent, I notice a few things immediately:

1) I don't see any cyclic behavior.  I understand you might expect that because chain booting can lead to cyclic dhcp/tftp operations easily, but this tcpdump shows 3 separate clients attempting to pxe boot.  Frame 13 shows a dhcp discover frame from mac address a8:ie:84:3a:80:b7, Frame 1033 shows a dchp discover from mac a8:1e:84:3a:8a:1f, and frame 2065 shows a discover from a8:1e:84:3a:87:8f.  Unless the NIC is changing its mac address, these are three separate clients behaving the same way, not the same client repeating itself. If the NIC is changing its MAC address, thats a completely different problem that the hardware vendor is going to need to address

2) The DHCP transaction seems to have some errors in it, namely the malformied option 77 (User Class information).  Its actually not fatal, as its derived from a aspect of the dhcp specification that is poorly defined, leading to different implementations formatting it diferently.  DHCP servers are smart enough to handle both implementations, but the fact that its the way it is suggests something else below

3) The OUI of each DHCP transaction isn't that of Intel, but rather of Quanta computing.  While thats fine, a quick lookup shows that quanta, while they use Intel parts, rebadges them to make them as their own.  While that also is fine, it implies they are shipping a non-standard version of the NIC (having reburned the eprom with their own MAC addresses).  Additionally to that, the existance of the treating of option 77 as a string rather than a TLV tuple in (2), suggests that they have their own firmware, as direct OEM intel NICS in our lab don't exhibit that behavior.  

4) The systems are using UNDI to drive the NICS, meaning that the ipxe code that is downloaded doesn't include the NIC driver, rather ipxe will use the driver embedded in the NIC firmware to initialize the card.

(3) and (4) are the big ones in my mind.  My guess would be that something in the new ipxe code is attempting to re-initialize the UNDI driver and the driver is failing.  An additional question, what version of the NIC preboot firmware is the system running?  The latest NIC firmware Quanta claims support for was released in late september:
https://downloadcenter.intel.com/download/19186/Ethernet-Intel-Ethernet-Connections-Boot-Utility-Preboot-Images-and-EFI-Drivers

I would suggest checking that first.

Comment 13 Dariusz Wojewódzki 2017-10-16 10:07:14 UTC
Hi Neil,
Thanks for the dump analysis and all Your comments.

Let me share additional explanations from Cust.
The NICs on the board do not change the MAC address and the tcpdump captured different servers attempting the PXE-boot.
This machine is in a cyclic boot and it does not succeed to boot on any interface. Eventually it will start all over again with the first NIC.

They have reproduced the faulty and successfull iPXE boot with the same firmware in the server's NIC.
- with ipxe.efi-20160127-5.git6366fa7a.el7 it was OK, while
- with ipxe.efi-20170123-1.git4e85b27.el7_4.1 - it was not working.
I linked: pxe-bad.cap and pxe-good.cap. Could You take a look on these caps too?

They were not able to enter the IPXE command-line prompt by pressing Ctrl-B. The prompt could be seen, but as They press the key combination, the prompt disappears and the boot continues.

To verify this card initialization and the embedded driver Cust opened a case also for Quanta. 
Can We share any details on the changes between these two ipxe.efi versions ?

Comment 14 Dariusz Wojewódzki 2017-10-16 10:10:15 UTC
Created attachment 1339151 [details]
pxe-bad.cap

Comment 15 Dariusz Wojewódzki 2017-10-16 10:32:59 UTC
Created attachment 1339156 [details]
pxe-good.cap.partaa

Comment 16 Dariusz Wojewódzki 2017-10-16 10:38:34 UTC
Created attachment 1339159 [details]
pxe-good.cap.partab

Comment 17 Dariusz Wojewódzki 2017-10-16 10:42:23 UTC
Created attachment 1339161 [details]
pxe-good.cap.partac

Comment 18 Dariusz Wojewódzki 2017-10-16 10:50:14 UTC
Created attachment 1339162 [details]
pxe-good.cap.partad

Comment 19 Dariusz Wojewódzki 2017-10-16 11:17:30 UTC
Created attachment 1339164 [details]
partae

Comment 20 Dariusz Wojewódzki 2017-10-16 11:22:02 UTC
Created attachment 1339165 [details]
partaf

Comment 21 Dariusz Wojewódzki 2017-10-16 11:53:35 UTC
Created attachment 1339189 [details]
partag

Comment 22 Dariusz Wojewódzki 2017-10-16 12:02:08 UTC
Created attachment 1339192 [details]
partah

Comment 23 Dariusz Wojewódzki 2017-10-16 12:50:40 UTC
Created attachment 1339209 [details]
partai

Comment 24 Dariusz Wojewódzki 2017-10-16 12:58:04 UTC
Created attachment 1339210 [details]
partaj

Comment 25 Dariusz Wojewódzki 2017-10-16 13:09:57 UTC
Created attachment 1339223 [details]
partak

Comment 26 Dariusz Wojewódzki 2017-10-16 13:13:04 UTC
Created attachment 1339224 [details]
partal

Comment 27 Dariusz Wojewódzki 2017-10-16 13:25:12 UTC
Created attachment 1339228 [details]
partam

Comment 28 Dariusz Wojewódzki 2017-10-16 13:32:55 UTC
Created attachment 1339240 [details]
partan

Comment 29 Dariusz Wojewódzki 2017-10-16 13:39:53 UTC
Created attachment 1339242 [details]
partao

Comment 30 Dariusz Wojewódzki 2017-10-16 13:46:56 UTC
Created attachment 1339243 [details]
partap

Comment 31 Dariusz Wojewódzki 2017-10-16 13:56:11 UTC
Created attachment 1339245 [details]
partaq

Comment 32 Dariusz Wojewódzki 2017-10-16 14:00:51 UTC
Created attachment 1339247 [details]
partar

Comment 33 Dariusz Wojewódzki 2017-10-16 14:06:52 UTC
Created attachment 1339248 [details]
partas

Comment 34 Dariusz Wojewódzki 2017-10-16 14:11:41 UTC
Created attachment 1339249 [details]
partat

MD5 sums

551f8162e96d77ad50f49e00266f8e60  pxe-bad.cap

1a3889bd54f1a4cde0177bdce52aef28  pxe-good.cap

32da570f7586d15222538bcee99b1aa5  pxe-good.cap.partaa
3646223e9275bd2c364c8fbaf98e8fee  pxe-good.cap.partab
40896fb4c1aaba1fb5b0df74167653a9  pxe-good.cap.partac
d536b58c37fb37cfa5bc48cbbc2209bb  pxe-good.cap.partad
53e613c45eb1b68d60162ee0a10b15ea  pxe-good.cap.partae
e6c7271d6a96a2aad8b7c3ffb905843a  pxe-good.cap.partaf
36a78d6edf1a25d844d4bd13dad18b9f  pxe-good.cap.partag
6e95fdaba4b599ef7fea1f51db09d43f  pxe-good.cap.partah
42abc5cc7441ca798e370488db266001  pxe-good.cap.partai
6c659ab477ab30551c3ba74b72e65da8  pxe-good.cap.partaj
b488de6927f808c121500056553390f1  pxe-good.cap.partak
d288dd81d2ef40dcb86d927fd65503a7  pxe-good.cap.partal
f06b4084476b5eb0790259ac56182647  pxe-good.cap.partam
f6b89bd8d4e4450de8a56536ac46e3f2  pxe-good.cap.partan
7300286616f04ebf108f0748fb2e2e49  pxe-good.cap.partao
cec7f9489dc0932a8dfdc7276c5feb86  pxe-good.cap.partap
7e9b0630adb5ef21b43b80a0bbc49b40  pxe-good.cap.partaq
b95e7a80fb9b4ad51a3d62361733ab43  pxe-good.cap.partar
de1fb0a300456da40f1bec85bbb8571d  pxe-good.cap.partas
8841d0da41c2db8552650b9b9684333f  pxe-good.cap.partat

Comment 35 Neil Horman 2017-10-16 15:50:52 UTC
Sure, you can share the source rpms for both packages, and they are welcome to scan the git logs for various changes.

That said, The comparative tcpdumps indicate that way more than just a change in the ipxe firmware is going on here.

To cite specifics:  Compare frames 1 through 10 in pxe-good.cap with frames 121 through 136 in pxe-bad.cap

A Good Vs. Bad comparison:

Goood Frame | Bad Frame |             Good            |           Bad
1           |  121      |  Requests options 53,57,93  | Requests options          
            |           |  94,60,77 (malformed), 55,  | 53,57,55,97,94,93,60
            |           |  175,61,97                  |
            |           |                             |
2           |  125      | Receives offer of           | Receives offer of
            |           | 192.0.2.100/24              | 192.0.2.100
            |           |                             |
3           |  126      | requests offer from (2)     | Requests offer from (2)
            |           | with options 53,57,93,94    | w/ options52,54,50,57
            |           | 60,77,55,175,61,97,54,50    | 55,97,94,93,60
            |           |                             |
4           |  127      | dhcp server acks request    | same ACK
            |           | with options 53,54,51       |
            |           | 67,58,59,1,28.3             |
            |           |                             |
6           |  128      | tftp read request sent to   | tftp read request sent 
            |           | server with tsize=0         | to server with
            |           | and bsize=1432              | tsize=0 and bsize=1468
            |           |                             |
7           |  131      | tftp server responds with   | tftp server responds
            |           | ack reporting block size    | with ack reporting 
            |           | 1432 and tsize 649632       | bsize 1468 and tsize
            |           | (ipxe file size)            | 715584 (ipxe file size)
            |           |                             |
-           | 132-134   |                             | Client aborts transfer
            |           |                             | citing negotiation fail
            |           |                             | restarts with same
            |           |                             | options, but leaves
            |           |                             | out tsize.  Server Acks
            |           |                             | also leaving out tsize
            |           |                             | and client accepts
            |           |                             |
9           | 136       |     First data block        |  First data block


So, there are clearly some large discrepancies here.  The dhcp client configuration between the two boots is clearly significantly different, which suggests that the client is using two uefi boot targets that are not configured in the same manner. and that should defintely first be rectified if we are to consider that something might be wrong with the ipxe firmware.  More significatnly however is the tftp configuration.  For some reason, during the bad request, the client asks for the ipxe.efi file with a different set of options (most notably the block size).  The larger block size makes the resultant udp data frames in the tftp 1514 bytes long, which is the maximum ethernet frame size.  That should be fine, but given that its udp may lead to a higher likelyhood of frame coruption.  Additionally, its interesting that the client failed the tftp negotiation when the tsize was set to 0 (which informs the server that it should report the file size being requested).  Only when it dropped the tsize option was the negotiation completed.  Part of me wonders if perhaps the Quantaco firmware has been modified to limit the size of the file it can download (due to some other corruption for larger files).  That would explain why the larger ipxe file fails to work (if was only partially downloaded or bits received past a certain threshold are corrupt).  Not including the tsize option may be a programming error that allows the download to start, and silently fail during execution, rather than fail prior to the attempt.

I would suggest that you and the customer:

1) Reconcile the client configs so that the dpcp phase of boot look identical between the two attempts

2) contact quataco and investigate why the tftp operation executes the way it does on the larger file request.  I expect with sufficient pressure, you will find that the NIC in question has a bug with larger ipxe images.

Comment 36 Dariusz Wojewódzki 2017-10-20 08:14:45 UTC
Hi Neil, 

Here is the Customer theory on how this could happen.

The ONLY thing that was changed between these two DHCP requests was the ipxe.efi file manually swapped on the tftp server (Director).

The sequence of events:
1.  The /tftpboot/ipxe.efi file is the 2017 version ("bad").

2. Server is powered up and sends the DHCP with some set of request options ... 

What is determining the set of DHCP options that will be found in this request? Cust theory is that it is determined by the version of the ipxe.efi file that was used the last time.

3. The PXE boot fails... for whatever reasons, - They have little visibility on this, relying on Quanta for their investigation

4. They manually swap the /tftpboot/ipxe.efi on the tftp boot (Director) server to the 2016 ("good") version

5. While monitoring the server console, They can see that the the next PXE boot attempt reflects the new ("good") file size, however,  the PXE boot still fails because the NIC was previously "programmed" by the "bad" ipxe.efi  file and the "bad" set of the DHCP request options is used at this time. However, the "good" ipxe.efi file is consumed by the NIC, and it will try to PXE boot again.

6. The next PXE boot is working fine, because at this point the NIC has been already "programmed" by the "good" ipxe.efi file and is using the "good" set of  DHCP request options, which leads to successful PXE boot and loading of subsequent ramdisk and other images.


For me it seems to be logical - but I am not an expert. 
What do you think of this explanation?

Comment 37 Neil Horman 2017-10-20 11:32:20 UTC
No.  

What you are describing is a case of seeing hoofprints and thinking there are zebras in the area, rather than horses.  That theory makes sense only if Quanta has redefined dhcp's specified behavior, which (were it true) would be its own very serious problem.

DHCP is a stateless protocol from the client standpoint, and relies on no outside information to determine what dhcp options to request.  The fact that the two dchp exchanges between good and bad differ is solely due to discrepancies in boot target configuration on the client, not because a prior dhcp exchange somehow reconfigured the adapter firmware

Allowing prior dhcp exchanges to reconfigure your clients would be a monumental security risk, as a rogue or malicious dhcp server could corrupt your entire install base, and would reflect a gross error on the part of the quanta firmware authors.

this is not a bizarre behavior of ipxe, this is a pedestrian problem with a NICs firmware. Please investigate points 1 & 2 of comment comment 35 as I requested.

Comment 38 Dariusz Wojewódzki 2017-11-28 14:43:07 UTC
Hello Neil,

Customer was able to introspect the servers by changing setting in the /etc/ironic-inspector/dnsmasq.conf

dhcp-boot=tag:efi,ipxe.efi
to
dhcp-boot=tag:efi,tag:!ipxe,ipxe.efi

The fix was introduced by another BZ #1479386 to puppet-ironic-9.5.0-2.el7ost.noarch.

It solved the issue for Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Adapter.


Ericsson, however, hit the same problem with Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) NICs.

The older 2016 version of ipxe.efi image is working, but the loading progress is very slow - 30 minutes at least.

-- after loading the 2016 ipxe.efi image the x710 NIC is sending the DHCP Discover ( packet # 932 in file X710-ipxe-2016.cap ) with 

	Option 93 Client System Architecture: EFI BC (7)
	Option 60 Vendor Class Identifier: PXEClient:Arch:00007:UNDI:003010

         and this causes the Director to offer to load the http://192.0.2.1:8088/inspector.ipxe
         then the introspection completes

-- after loading the 2017 ipxe.efi image, the x710 NIC is sending the DHCP Discover ( packet # 1018 in file X710-ipxe-2017.cap ) with

	Option 93 Client System Architecture: EFI x86-64 (9)
	Option 60 Vendor Class Identifier: PXEClient:Arch:00009:UNDI:003010

        and this causes the Director to offer to load the ipxe.efi file again, then PXE boot is looping.

Could you examine this case please? I will attach dumps soon.

Comment 39 Dariusz Wojewódzki 2017-11-28 14:46:24 UTC
Created attachment 1359922 [details]
X710-ipxe-2017

Comment 40 Dariusz Wojewódzki 2017-11-28 14:55:21 UTC
Created attachment 1359923 [details]
X710-ipxe-2016.cap.partaa

Comment 41 Dariusz Wojewódzki 2017-11-28 14:57:45 UTC
Created attachment 1359925 [details]
X710-ipxe-2016.cap.partab

Comment 42 Dariusz Wojewódzki 2017-11-28 15:09:42 UTC
Some technical detail regarding the NICs:

[root@overcloud-controller-0 heat-admin]# lspci | grep X710
83:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)
83:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)
83:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)
83:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

[root@overcloud-controller-0 heat-admin]# ethtool -i ens6f0
driver: i40e
version: 1.6.27-k
firmware-version: 4.53 0x80001fad 0.0.0
expansion-rom-version: 
bus-info: 0000:83:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Is there any known issue with x710 using the 2017 ipxe ?

Comment 45 Neil Horman 2018-02-06 14:03:59 UTC
Ok, Well, I don't know what to tell you then.  I'm a bit dumbfounded that there were willing to speculate and investigate so much early on, but are now unable to work on this issue, but I can't pretend to know their business needs.  As such, please re-open if they are able to pick this back up later.


Note You need to log in before you can comment on or make changes to this bug.