Bug 1577112
| Summary: | PXE is not detecting link on Intel I219v-2 NIC on HP Z4 G4 Core Workstation | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Yaniv Ferszt <yferszt> |
| Component: | ipxe | Assignee: | Neil Horman <nhorman> |
| ipxe sub component: | ipxe-bootimgs | QA Contact: | Raviv Bar-Tal <rbartal> |
| Status: | CLOSED INSUFFICIENT_DATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | unspecified | CC: | mdekan, nhorman, sfroemer, yferszt |
| Version: | 7.5 | ||
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-31 11:31:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Yaniv Ferszt
2018-05-11 08:38:00 UTC
where do you see that that nic has an included driver? The fact that lspci detects the hardware only indicates that it is responsive on the pci bus. Looking at the vendor and device ID, our version of ipxe doesn't currently support that NIC. I may be able to add it, but a faster path to operation would likely be to use the undionly ipxe image. Its not clearly documented, but it appears that that NIC has a UNDI driver in its rom, and so using the undi pxe image should cause the NIC to work. Please give that a try, and if it doesn't work, I can backport commits ef1c4b1c9031adcb4aee01bd628d96fc0c676b94 and 546dd51de8459d4d09958891f426fa2c73ff090d for you to test out. I assume then that they are burning the pxe rom directly into the NIC flash? the ipxe rom should also have usb block drivers. They should be able to chainload from a usb key. if they are already booting via a physical medium, they should not have any need for ipxe at all. Unless you mean to imply that they are downloading a dvd iso over the network from the pxe firmware embedded into the NIC already. If thats the case all you need to do is break into the pxe command line and use the "chain" command documented here (http://ipxe.org/cmd/chain), to load the undionly image from a usb key that you have locally attached to the system. There are plenty of cookbooks to do various things with ipxe on the internet, but nothing we have specific to doing exactly this. The summary however is, to understand that the chain command allows you to execute a secondary pxe image, replacing the running one at run time, from a media source of your choosing Ok, that helps a bit. Can you confirm that: 1) The ipxe image on the boot media (cdrom or usb key) is a customer image that they have built 2) That the ipxe image on the boot media is able to communicate over the provided nic, preform a dhcp and fetch the second ipxe script If (1) and (2) are true, then there is no need to go on, they have confirmed that their custom image (which presumably contains the upstream intel pci id to support the NIC in question) is functional and I can just backport that If however, that is not the case, and the USB key or CDROM they are booting from has no ability to communicate on the network, then what you need to do is create a new boot image on that key replacing whatever pxe image you have on it with the undionly.kpxe image. This doesn't have to be a unilateral change, you only need to do it for testing purposes. If it works, thats your path forward until we update the ipxe source wholesale. Because they are booting from a USB drive here, there is no need to use the chain command, as you can replace the initial boot image directly. Ok, heres a test build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16316829 It should enable the i219-v2 adapter as upstream does. Please replace the pxe image on your usb key with the ipxe image from this build and retest to confirm that it is working. The Customer tested the test build provided. Boot is still failing with no link being detected. [Link:down, TX:0 TEX:0 RX:0 RXE:0] [Link status: Down (http://ipxe.org/38086101)] Waiting for link-up on net0.......................... Down (http://ipxe.org/38086101) No more network devices Ok, so we're going to have to debug this. Please run this build, which has debug messages enabled and provide the full boot log. https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16343876 yes, i didn't bump the release number, sorry well, thats troubling. You're getting an unexpected interrupt in which the cause register has bit 9 set, which is undefined and reserved according to the E1000 specification (which this NIC is supposed to conform to). If the customer has an ipxe image that is functional, then that implies that someone has done something (likely inadvertently) to force the NIC into compliant behavior. That in turn implies that whoever fixed this, didn't really fix it, but just got it to work by chance. So we can move forward in a few ways: 1) We can bisect the tree to find where the image started working properly and work backwards to a root cause from there 2) We can play guess and check by looking at the commit list for commits that might be interesting to this problem. 3) We can contact HP to request assistance with understanding why their NIC is setting reserved bit 9 in their ICR (I say theirs because HP typically takes intel NICS and puts custom firmware on them) I would recommend that we pursue options 1 and 2 in parallel. If you can tell me the commit hash of the working image that the customer has, I can start a bisect to find the commit that fixed this (this will require the customer test several images that I provide). In parallel you should open a TSAnet case with HP to understand why this bit is getting set when it shouldn't be, so that we can better understand the problem we are dealing with. Neil, I'm not aware of any image, which the customer is able to boot properly. Therefore I'm afraid of doing Option 1. But I will ask Customer again, if he does have a working image. In the meantime, if you have any new images, I guess we're able to provide them to the customer and he will run a test, when possible. I will open the TSA-ticket to get clarification on this. Please let me know, if I missed something. comment 10 seems to indicate that they do indeed have a working ipxe image. Please confirm that, and if it is not the case, ask for clarification on that comment. I'm not going to start building ipxe images until we are sure the customer has a working upstream image (so we know what point in the git tree they are at to start the bisect with). The TSAnet ticket at this point is likely our fastest route to closure, given the unexpected bit set in the status register What I get from comment#10 is, that the network card is working only when installing RHEL using the ISO-image (no PXE) But for clarification, I've asked the customer. Beside this, I tried to use iPXE image on my Intel-NUC, which does have same NIC included and I'm getting similar issue. Maybe it's not directly an HPE issue at all. What do you think? iPXE initialising devices... INTEL 0xa8ae0 MAC+PHY reset (ctrl 0018260) INTEL 0xa8ae0 has autoloaded MAC address 00:1f:c6:9c:62:c9 INTEL 0xa8ae0 link status is 40080680 ok iPXE 1.0.0+ (4e85b27) -- Open Source Network Boot Firmware -- http://ipxe.org Features: DNS HTTP iSCI TFTP SRP AoE ELF MBOOT PXE bzImage Menu PXEXT net0 is a i219lm-2 with MAC 00:1f:c6:9c:62:c9 INTEL 0xa8ae0 ring 03800 is at [165d6e00, 165d6f00) INTEL 0xa8ae0 ring 02800 is at [165d6f00, 165d7000) INTEL 0xa8ae0 RX 0 [165d7000, 165d7800) INTEL 0xa8ae0 RX 1 [165d7800, 165d8000) INTEL 0xa8ae0 RX 2 [165d8000, 165d8800) INTEL 0xa8ae0 RX 3 [165d8800, 165d9000) INTEL 0xa8ae0 RX 4 [165d9000, 165d9800) INTEL 0xa8ae0 RX 5 [165d9800, 165da000) INTEL 0xa8ae0 RX 6 [165da000, 165da800) INTEL 0xa8ae0 RX 7 [165da800, 165db000) INTEL 0xa8ae0 link status is 40080680 INTEL 0xa8ae0 unexpected ICR 000000102 Waiting for link-up on net0................. Down (http://ipxe.org/38086101) INTEL 0xa8ae0 MAC+PHY reset (ctrl 0018260) Failed to chainload from any network interface I don't know, thats why I asked you to contact HP, because the driver doesn't know what to do with the setting of that reserved bit in the ICR. If you are able to recreate the issue on an intel branded NIC, then, yes, you should probably open a TSA case with intel rather than HP, but one way or another we need to contact someone with insight into the hardware to understand what that bit represents to we can either properly code around it, or otherwise understand what to do to make the hardware behave properly. closing for lack of response |