Red Hat Bugzilla – Bug 1267030
ipxe timeout when performing introspection through Intel i350 NIC
Last modified: 2016-11-03 20:36:34 EDT
Description of problem:
On a few Dell R420 servers with both Broadcom and Intel NICs, ipxe works fine when netbooting from the Broadcom NIC but times out from netbooting from the Intel NIC.
Version-Release number of selected component (if applicable):
$ rpm -qf /usr/share/instack-undercloud/ipxe/post-install.d/88-setup-ipxe
$ rpm -qf /usr/share/ipxe/undionly.kpxe
always (we tried flashing the firmwares, to no avail)
Steps to Reproduce:
1. set the MAC to that of the Intel NIC in instackenv.json
2. start introspection
ipxe times out on the Intel NIC but works on the Broadcom NIC (inside the same VLAN and on the same switch).
Introspection should finish fine.
- From the node itself, on a pre-installed RHEL7.0, 'dhclient' takes only a few secs on Broadcom and close to 30 seconds on the Intel NICs.
- We also found a workaround by updating the iPXE payload (updating the undionly.kpxe binary from the latest builds available on ipxe.org):
on the instack machine:
# curl -O http://boot.ipxe.org/undionly.kpxe
# chmod 744 /tftpboot/undionly.kpxe
# chown ironic:ironic /tftpboot/undionly.kpxe
# chcon system_u:object_r:tftpdir_t:s0 /tftpboot/undionly.kpxe
Created attachment 1078072 [details]
Hi! So, if you can confirm that newer iPXE firmware works for you, than updating ipxe-bootimgs to something newer than May 2013 (which we have judging by the RPM version) is probably the only thing we can do. Mike, do you think we could retarget this bug to ipxe-bootimgs package?
In this case, we're limited to what is shipped in RHEL. Adding Miroslav who seems to own ipxe in RHEL
we can try to rebase ipxe in 7.3 in case there's not proper patch found.
Great, moving this to RHEL, then.
I don't think this issue is related to OOO. The ipxe payload update is merely a workaround for the issue we ran into. We discovered that it works better (it does not timeout) if we use the more recent ipxe payload.
At any case:
1) we're still looking into the base issue (DHCP timeout with Intel NICs and Nortel switches)
2) the ipxe payloads in RHEL7.x need an update (IMHO).
For the curious, here a small screencast captured on my desktop and showing:
1) tcpdump for the client's MAC on the hypervisor hosting the instack VM.
2) the client machine's console. Notice the delay in obtaining the first lease through PXE and witness the timeout with the default iPXE payload (the newer payload worked around that issue and allowed us to sucessfully instrospect and deploy).
Created attachment 1080064 [details]
Screencast showing tcpdum of client's MAC on hypervisor and client console..
Satellite 6 customers hit this as well, please rebase.
Enabling PortFast (STP) on the switch fix the issue.
*** Bug 1290569 has been marked as a duplicate of this bug. ***
FYI, at Dell, we are not seeing timeout issues when PXE booting from Intel NICs.
I can verify that the Dell R630 and R730xd systems with Intel X520 i350 nics are booting properly using the following ROMS:
NB: the git hash should match the ipxe version hash displayed when chainloading.
Would you please verify this bug as it is ON_QA now? Thanks!
The problem is solved by the new roms, there is no new failure report related to this problem, this was verified with the Udi the owner of bug https://bugzilla.redhat.com/show_bug.cgi?id=1301694
and As Dan wrote in comment #29.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.