Bug 1577112 - PXE is not detecting link on Intel I219v-2 NIC on HP Z4 G4 Core Workstation
Summary: PXE is not detecting link on Intel I219v-2 NIC on HP Z4 G4 Core Workstation
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ipxe
Version: 7.5
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Neil Horman
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-11 08:38 UTC by Yaniv Ferszt
Modified: 2019-09-09 11:14 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-31 11:31:12 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Yaniv Ferszt 2018-05-11 08:38:00 UTC
Description of problem:
PXE is not detecting link on Intel I219v-2 NIC on HP Z4 G4 Core Workstation

Version-Release number of selected component (if applicable):
- RHEL 7.5

Tested with 
- ipxe-bootimgs-20170123-1.git4e85b27.el7_4.1.noarch (RHEL 7)
- ipxe generic image from satellite 6.2.11
- ipxe git version as of 22 Sep 2017 (7428)
- latest ipxe git version as of 2 May 2018 (960d1)

How reproducible:
Network installation of RHEL on Intel I219v-2 NIC on HP Z4 G4 Core Workstation using iPXE bootloader


Steps to Reproduce:
1. Boot from a NIC Intel I219v-2 into ipxe installation


Actual results:
Network installation of RHEL using iPXE bootloader fails because no link is detected


Expected results:
Network installation of RHEL using iPXE bootloader should work

Additional info:
Driver for ipxe for network card 'Intel I219v-2 NIC' is included, so I would say ipxe boot should work


lspci -s 00:1f.6 -nn -v
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8]
	Subsystem: Hewlett-Packard Company Device [103c:81c5]
	Flags: bus master, fast devsel, latency 0, IRQ 84
	Memory at 94200000 (32-bit, non-prefetchable) [size=128K]
	Capabilities: [c8] Power Management version 3
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [e0] PCI Advanced Features
	Kernel driver in use: e1000e
	Kernel modules: e1000e

Comment 2 Neil Horman 2018-05-11 10:28:30 UTC
where do you see that that nic has an included driver?  The fact that lspci detects the hardware only indicates that it is responsive on the pci bus.  Looking at the vendor and device ID, our version of ipxe doesn't currently support that NIC.  I may be able to add it, but a faster path to operation would likely be to use the undionly ipxe image. Its not clearly documented, but it appears that that NIC has a UNDI driver in its rom, and so using the undi pxe image should cause the NIC to work.

Please give that a try, and if it doesn't work, I can backport commits ef1c4b1c9031adcb4aee01bd628d96fc0c676b94 and 546dd51de8459d4d09958891f426fa2c73ff090d for you to test out.

Comment 4 Neil Horman 2018-05-14 11:18:53 UTC
I assume then that they are burning the pxe rom directly into the NIC flash?

the ipxe rom should also have usb block drivers. They should be able to chainload from a usb key.

Comment 6 Neil Horman 2018-05-14 15:48:59 UTC
if they are already booting via a physical medium, they should not have any need for ipxe at all.

Unless you mean to imply that they are downloading a dvd iso over the network from the pxe firmware embedded into the NIC already.  If thats the case all you need to do is break into the pxe command line and use the "chain" command documented here (http://ipxe.org/cmd/chain), to load the undionly image from a usb key that you have locally attached to the system.

There are plenty of cookbooks to do various things with ipxe on the internet, but nothing we have specific to doing exactly this.  The summary however is, to understand that the chain command allows you to execute a secondary pxe image, replacing the running one at run time, from a media source of your choosing

Comment 8 Neil Horman 2018-05-16 12:18:49 UTC
Ok, that helps a bit.  Can you confirm that:

1) The ipxe image on the boot media (cdrom or usb key) is a customer image that they have built
2) That the ipxe image on the boot media is able to communicate over the provided nic, preform a dhcp and fetch the second ipxe script

If (1) and (2) are true, then there is no need to go on, they have confirmed that their custom image (which presumably contains the upstream intel pci id to support the NIC in question) is functional and I can just backport that

If however, that is not the case, and the USB key or CDROM they are booting from has no ability to communicate on the network, then what you need to do is create a new boot image on that key replacing whatever pxe image you have on it with the undionly.kpxe image.  This doesn't have to be a unilateral change, you only need to do it for testing purposes.  If it works, thats your path forward until we update the ipxe source wholesale.  Because they are booting from a USB drive here, there is no need to use the chain command, as you can replace the initial boot image directly.

Comment 11 Neil Horman 2018-05-16 15:17:46 UTC
Ok, heres a test build:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16316829



It should enable the i219-v2 adapter as upstream does.  Please replace the pxe image on your usb key with the ipxe image from this build and retest to confirm that it is working.

Comment 12 Yaniv Ferszt 2018-05-17 10:00:15 UTC
The Customer tested the test build provided.
Boot is still failing with no link being detected.

  [Link:down, TX:0 TEX:0 RX:0 RXE:0] 
  [Link status: Down (http://ipxe.org/38086101)]
  Waiting for link-up on net0.......................... Down (http://ipxe.org/38086101)
  No more network devices

Comment 13 Neil Horman 2018-05-17 14:47:26 UTC
Ok, so we're going to have to debug this.  Please run this build, which has debug messages enabled and provide the full boot log.

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16343876

Comment 15 Neil Horman 2018-05-17 15:34:55 UTC
yes, i didn't bump the release number, sorry

Comment 17 Neil Horman 2018-05-18 13:52:28 UTC
well, thats troubling.  You're getting an unexpected interrupt in which the cause register has bit 9 set, which is undefined and reserved according to the E1000 specification (which this NIC is supposed to conform to).  If the customer has an ipxe image that is functional, then that implies that someone has done something (likely inadvertently) to force the NIC into compliant behavior. That in turn implies that whoever fixed this, didn't really fix it, but just got it to work by chance.

So we can move forward in a few ways:

1) We can bisect the tree to find where the image started working properly and work backwards to a root cause from there

2) We can play guess and check by looking at the commit list for commits that might be interesting to this problem.

3) We can contact HP to request assistance with understanding why their NIC is setting reserved bit 9 in their ICR (I say theirs because HP typically takes intel NICS and puts custom firmware on them) 

I would recommend that we pursue options 1 and 2 in parallel.  If you can tell me the commit hash of the working image that the customer has, I can start a bisect to find the commit that fixed this (this will require the customer test several images that I provide).  In parallel you should open a TSAnet case with HP to understand why this bit is getting set when it shouldn't be, so that we can better understand the problem we are dealing with.

Comment 20 Steffen Froemer 2018-06-04 16:39:17 UTC
Neil, I'm not aware of any image, which the customer is able to boot properly. Therefore I'm afraid of doing Option 1.

But I will ask Customer again, if he does have a working image.
In the meantime, if you have any new images, I guess we're able to provide them to the customer and he will run a test, when possible.

I will open the TSA-ticket to get clarification on this.

Please let me know, if I missed something.

Comment 21 Neil Horman 2018-06-04 17:31:00 UTC
comment 10 seems to indicate that they do indeed have a working ipxe image.  Please confirm that, and if it is not the case, ask for clarification on that comment.

I'm not going to start building ipxe images until we are sure the customer has a working upstream image (so we know what point in the git tree they are at to start the bisect with).

The TSAnet ticket at this point is likely our fastest route to closure, given the unexpected bit set in the status register

Comment 22 Steffen Froemer 2018-06-05 09:11:58 UTC
What I get from comment#10 is, that the network card is working only when installing RHEL using the ISO-image (no PXE)

But for clarification, I've asked the customer.

Beside this, I tried to use iPXE image on my Intel-NUC, which does have same NIC included and I'm getting similar issue.

Maybe it's not directly an HPE issue at all. What do you think?


iPXE initialising devices... INTEL 0xa8ae0 MAC+PHY reset (ctrl 0018260)
INTEL 0xa8ae0 has autoloaded MAC address 00:1f:c6:9c:62:c9
INTEL 0xa8ae0 link status is 40080680
ok

iPXE 1.0.0+ (4e85b27) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP iSCI TFTP SRP AoE ELF MBOOT PXE bzImage Menu PXEXT
net0 is a i219lm-2 with MAC 00:1f:c6:9c:62:c9
INTEL 0xa8ae0 ring 03800 is at [165d6e00, 165d6f00)
INTEL 0xa8ae0 ring 02800 is at [165d6f00, 165d7000)
INTEL 0xa8ae0 RX 0 [165d7000, 165d7800)
INTEL 0xa8ae0 RX 1 [165d7800, 165d8000)
INTEL 0xa8ae0 RX 2 [165d8000, 165d8800)
INTEL 0xa8ae0 RX 3 [165d8800, 165d9000)
INTEL 0xa8ae0 RX 4 [165d9000, 165d9800)
INTEL 0xa8ae0 RX 5 [165d9800, 165da000)
INTEL 0xa8ae0 RX 6 [165da000, 165da800)
INTEL 0xa8ae0 RX 7 [165da800, 165db000)
INTEL 0xa8ae0 link status is 40080680
INTEL 0xa8ae0 unexpected ICR 000000102
Waiting for link-up on net0................. Down (http://ipxe.org/38086101)
INTEL 0xa8ae0 MAC+PHY reset (ctrl 0018260)
Failed to chainload from any network interface

Comment 23 Neil Horman 2018-06-05 11:36:47 UTC
I don't know, thats why I asked you to contact HP, because the driver doesn't know what to do with the setting of that reserved bit in the ICR.  If you are able to recreate the issue on an intel branded NIC, then, yes, you should probably open a TSA case with intel rather than HP, but one way or another we need to contact someone with insight into the hardware to understand what that bit represents to we can either properly code around it, or otherwise understand what to do to make the hardware behave properly.

Comment 24 Neil Horman 2018-10-31 11:31:12 UTC
closing for lack of response


Note You need to log in before you can comment on or make changes to this bug.