Bug 1267030 - ipxe timeout when performing introspection through Intel i350 NIC
ipxe timeout when performing introspection through Intel i350 NIC
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ipxe (Show other bugs)
7.1
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Lucas Alvares Gomes
Raviv Bar-Tal
: Rebase
: 1290569 (view as bug list)
Depends On: 1298313
Blocks: 1290569 1300702 1300704
  Show dependency treegraph
 
Reported: 2015-09-28 17:02 EDT by Vincent S. Cojot
Modified: 2016-11-03 20:36 EDT (History)
42 users (show)

See Also:
Fixed In Version: ipxe-20150821-1.git4e03af8e.el7
Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Story Points: ---
Clone Of:
: 1290569 1300702 (view as bug list)
Environment:
Last Closed: 2016-11-03 20:36:34 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
ipxe timeout (59.29 KB, image/jpeg)
2015-09-28 17:08 EDT, Vincent S. Cojot
no flags Details
Screencast showing tcpdum of client's MAC on hypervisor and client console.. (1.20 MB, application/octet-stream)
2015-10-05 16:04 EDT, Vincent S. Cojot
no flags Details

  None (edit)
Description Vincent S. Cojot 2015-09-28 17:02:55 EDT
Description of problem:

On a few Dell R420 servers with both Broadcom and Intel NICs, ipxe works fine when netbooting from the Broadcom NIC but times out from netbooting from the Intel NIC.


Version-Release number of selected component (if applicable):

$ rpm -qf /usr/share/instack-undercloud/ipxe/post-install.d/88-setup-ipxe
instack-undercloud-2.1.2-23.el7ost.noarch

$ rpm -qf /usr/share/ipxe/undionly.kpxe
ipxe-bootimgs-20130517-6.gitc4bce43.el7.noarch


How reproducible:

always (we tried flashing the firmwares, to no avail)

Steps to Reproduce:
1. set the MAC to that of the Intel NIC in instackenv.json
2. start introspection


Actual results:

ipxe times out on the Intel NIC but works on the Broadcom NIC (inside the same VLAN and on the same switch).

Expected results:

Introspection should finish fine.

Additional info:

- From the node itself, on a pre-installed RHEL7.0, 'dhclient' takes only a few secs on Broadcom and close to 30 seconds on the Intel NICs.

- We also found a workaround by updating the iPXE payload (updating the undionly.kpxe binary from the latest builds available on ipxe.org):
on the instack machine:
# curl -O http://boot.ipxe.org/undionly.kpxe
# chmod 744 /tftpboot/undionly.kpxe
# chown ironic:ironic /tftpboot/undionly.kpxe
# chcon system_u:object_r:tftpdir_t:s0 /tftpboot/undionly.kpxe
Comment 2 Vincent S. Cojot 2015-09-28 17:08 EDT
Created attachment 1078072 [details]
ipxe timeout
Comment 3 Dmitry Tantsur 2015-10-01 08:36:09 EDT
Hi! So, if you can confirm that newer iPXE firmware works for you, than updating ipxe-bootimgs to something newer than May 2013 (which we have judging by the RPM version) is probably the only thing we can do. Mike, do you think we could retarget this bug to ipxe-bootimgs package?
Comment 4 Mike Burns 2015-10-01 09:04:55 EDT
In this case, we're limited to what is shipped in RHEL.  Adding Miroslav who seems to own ipxe in RHEL
Comment 5 Miroslav Rezanina 2015-10-02 03:55:29 EDT
Hi Mike,
we can try to rebase ipxe in 7.3 in case there's not proper patch found.
Comment 6 Mike Burns 2015-10-02 07:23:07 EDT
Great, moving this to RHEL, then.
Comment 8 Vincent S. Cojot 2015-10-05 16:03:22 EDT
Hi everyone,
I don't think this issue is related to OOO. The ipxe payload update is merely a workaround for the issue we ran into. We discovered that it works better (it does not timeout) if we use the more recent ipxe payload.
At any case:
1) we're still looking into the base issue (DHCP timeout with Intel NICs and Nortel switches)
2) the ipxe payloads in RHEL7.x need an update (IMHO).

For the curious, here a small screencast captured on my desktop and showing:

1) tcpdump for the client's MAC on the hypervisor hosting the instack VM.
2) the client machine's console. Notice the delay in obtaining the first lease through PXE and witness the timeout with the default iPXE payload (the newer payload worked around that issue and allowed us to sucessfully instrospect and deploy).

Kind regards,

Vincent
Comment 9 Vincent S. Cojot 2015-10-05 16:04 EDT
Created attachment 1080064 [details]
Screencast showing tcpdum of client's MAC on hypervisor and client console..
Comment 10 Lukas Zapletal 2015-10-15 05:37:25 EDT
Satellite 6 customers hit this as well, please rebase.
Comment 19 Gonéri Le Bouder 2015-11-26 09:26:43 EST
Enabling PortFast (STP) on the switch fix the issue.
Comment 20 Mike Burns 2016-01-13 09:39:17 EST
*** Bug 1290569 has been marked as a duplicate of this bug. ***
Comment 27 Chris Dearborn 2016-02-19 12:28:19 EST
FYI, at Dell, we are not seeing timeout issues when PXE booting from Intel NICs.
Comment 29 Dan Yocum 2016-04-21 10:47:47 EDT
I can verify that the Dell R630 and R730xd systems with Intel X520 i350 nics are booting properly using the following ROMS:

ipxe-bootimgs-20160127-1.git6366fa7a.el7.noarch

NB: the git hash should match the ipxe version hash displayed when chainloading.
Comment 30 Chao Yang 2016-08-23 06:45:19 EDT
Hi Raviv,

Would you please verify this bug as it is ON_QA now? Thanks!
Comment 31 Raviv Bar-Tal 2016-09-11 08:39:01 EDT
The problem is solved by the new roms, there is no new failure report related to this problem, this was verified with the Udi the owner of bug https://bugzilla.redhat.com/show_bug.cgi?id=1301694
and As Dan wrote in comment #29.
Comment 34 errata-xmlrpc 2016-11-03 20:36:34 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2214.html

Note You need to log in before you can comment on or make changes to this bug.