Description of problem: Testing with 5.2 beta, we've discovered that any HP DL-360-G5s that we either upgrade from 5.1 or perform a fresh install on (PXE + kickstart) become immediately unusable in our environment after rebooting with the new kernel, because the on-board NICs are re-ordered wrt the add-on NICs. We've also had some issues in the past with NIC ordering in 5.1 that we thought pci=bfsort as a kernel arg may have helped with, but we'd have to go back and re-check, because we don't have many 5.1 systems. Version-Release number of selected component (if applicable): Kernel: 2.6.18-84.el5 x86_64 Hardware: HP DL-360-G5 with Intel Pro/1000 PT dual-port PCI-e card installed in low-profile PCI-e slot Also tested with Intel Pro/1000 PT quad-port PCI-e card with similar results. How reproducible: Always (on same hardware config) Steps to Reproduce: 1. a) Either: Install RH 5.2 beta on vanilla HP DL-360-G5 with above NIC config and allow anaconda to reboot after install or: b) Patch existing 5.1 system in same hardware config to 5.2 2. Reboot with new kernel ( 2.6.18-84.el5 ) Actual results: - eth3 and eth4 map to on-board NICs (bnx2) - eth0 and eth1 map to Intel NICs in PCI-e low-profile slot Expected results: - eth0 and eth1 should correspond to on-board NICs Additional info: On a test server, lspci shows these on-board NICs: 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) and these Intel NICs in the low-profile PCI slot: 0b:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 0b:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) After a fresh install, /etc/modprobe.conf looks like this: ------------------------------------- alias eth0 bnx2 alias eth1 bnx2 alias eth2 e1000e alias eth3 e1000e alias scsi_hostadapter cciss alias scsi_hostadapter1 ata_piix ------------------------------------- And 'ip addr' shows after anaconda reboot: ------------------------------------- 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000 link/ether 00:15:17:63:61:f6 brd ff:ff:ff:ff:ff:ff 3: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000 link/ether 00:15:17:63:61:f7 brd ff:ff:ff:ff:ff:ff 4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:1e:0b:e9:d1:94 brd ff:ff:ff:ff:ff:ff inet 10.1.1.252/24 brd 10.1.1.255 scope global eth0 inet6 fe80::21e:bff:fee9:d194/64 scope link valid_lft forever preferred_lft forever 5: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000 link/ether 00:1e:0b:e9:f1:a6 brd ff:ff:ff:ff:ff:ff 6: sit0: <NOARP> mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 ----------------------------------------------- At this point eth0/eth1 map to on-board NICs. Also note: dmesg after shows: ------------------------------------------------- e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0 e1000e: Copyright (c) 1999-2007 Intel Corporation. ACPI: PCI Interrupt 0000:0b:00.0[A] -> GSI 16 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:0b:00.0 to 64 input: PC Speaker as /class/input/input2 0000:0a:00.0: eth0: (PCI Express:2.5GB/s:Width x4) 00:15:17:63:61:f6 0000:0a:00.0: eth0: Intel(R) PRO/1000 Network Connection 0000:0a:00.0: eth0: MAC: 0, PHY: 4, PBA No: d50868-003 ACPI: PCI Interrupt 0000:0b:00.1[B] -> GSI 17 (level, low) -> IRQ 177 PCI: Setting latency timer of device 0000:0b:00.1 to 64 Floppy drive(s): fd0 is 1.44M EDAC MC: Ver: 2.0.1 Feb 29 2008 intel_rng: FWH not detected 0000:0a:00.0: eth1: (PCI Express:2.5GB/s:Width x4) 00:15:17:63:61:f7 0000:0a:00.0: eth1: Intel(R) PRO/1000 Network Connection 0000:0a:00.0: eth1: MAC: 0, PHY: 4, PBA No: d50868-003 Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.6.9 (December 8, 2007) ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 18 (level, low) -> IRQ 185 eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found d at mem f8000000, IRQ 185, node addr 001e0be9d194 ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19 (level, low) -> IRQ 82 eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found d at mem fa000000, IRQ 82, node addr 001e0be9f1a6 ------------------------------------------------- Rebooting after this install-initiated reboot results in: ------------------------------------------------- 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000 link/ether 00:15:17:63:61:f6 brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000 link/ether 00:15:17:63:61:f7 brd ff:ff:ff:ff:ff:ff 4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000 link/ether 00:1e:0b:e9:d1:94 brd ff:ff:ff:ff:ff:ff 5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000 link/ether 00:1e:0b:e9:f1:a6 brd ff:ff:ff:ff:ff:ff 6: sit0: <NOARP> mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 ------------------------------------------------- dmesg shows: ------------------------------------------------- 0000:0a:00.0: eth0: (PCI Express:2.5GB/s:Width x4) 00:15:17:63:61:f6 0000:0a:00.0: eth0: Intel(R) PRO/1000 Network Connection 0000:0a:00.0: eth0: MAC: 0, PHY: 4, PBA No: d50868-003 ACPI: PCI Interrupt 0000:0b:00.1[B] -> GSI 17 (level, low) -> IRQ 177 PCI: Setting latency timer of device 0000:0b:00.1 to 64 0000:0a:00.0: eth1: (PCI Express:2.5GB/s:Width x4) 00:15:17:63:61:f7 0000:0a:00.0: eth1: Intel(R) PRO/1000 Network Connection 0000:0a:00.0: eth1: MAC: 0, PHY: 4, PBA No: d50868-003 Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.6.9 (December 8, 2007) shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 18 (level, low) -> IRQ 185 EDAC MC0: Giving out device to i5000_edac.c I5000: DEV 0000:00:10.0 eth2: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found d at mem f8000000, IRQ 185, node addr 001e0be9d194 ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19 (level, low) -> IRQ 82 eth3: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found d at mem fa000000, IRQ 82, node addr 001e0be9f1a6 ------------------------------------------------- At this point eth0/eth1 and eth2/3 are swapped and the system is unreachable over the network. This is a recurring theme on x86 hardware and one that needs to be addressed once and for all. There should be a predictable ordering of PCI-e devices, without hacks such as hard-coding MAC addresses (unacceptable, not robust under several scenarios). It just happens to have regressed in 5.2 beta on this hardware.
Vinod, I would like to clarify a few things so I can try and address this. 1. Are you stating there is a difference in device ordering between the system that is running anaconda (did you hit CTRL-ALT-F2 to get that info?) and the system after it is installed? 2. Are you stating that simply booting at 5.2 kernel on a 5.1 system (which is fine with me) causes the eth0/1 to switch with eth2/3? Thanks!
Andy, I've been traveling and wanted to verify the your 2nd question before responding. RE: 1. Yes, I believe so based on the fact that it PXE boots then installs and boots with the correct ordering just once. One of our feature requests for many years has been to make the whole install process more friendly for "headless" servers. In this case, I cannot get CTRL-ALT-F2 info via the serial console ... unless there's something new I should be aware of? For now a log file is probably the easiest way to collect this info if you feel it's worthwhile. RE: 2. Yes. I installed just kernel alone (prior test case used a yum update from 5.1 -> 5.2 which pulled other RPMs). Note that installing a new kernel leaves the 'e1000' driver in place in /etc/modprobe.conf, but I see that 'e1000e' is actually loaded.
Ok, let's address #2 first. Did your ifcfg-ethX files have entries for HWADDR=<mac address> for all 4 interfaces? I know that users often delete these entries, so I want to check.
No. We don't put MAC addresses in there as our FRU is an entire server. We pull drives from a failed chassis into a replacement and so it's important that no state is preserved on the drives that would cause probs in a new chassis with new NICs, etc.
Also another observation: building with pci=bfsort seems to be a workaround that works in a simple test, but I haven't tested more thoroughly. We've used it before in RH5.1 and it seemed to be inconsistent across different hardware platforms. Is the DL-360-G5 part of the white list for this? (i.e. so we don't have to explicitly specify it)
No it is not currently in the bfsort-whitelist.
HWADDR (or similar mappings) is pretty much required in any case with multiple types of ethernet drivers. You will not get consistent results otherwise.
Comment #9 suggests this is a user error.
I agreed with comment #9 and comment #10.
HWADDR is a really poor workaround for a bigger problem. It's fine in a static, small environment, but unacceptable in a large enterprise with lots of hardware change. How does a crash dump kernel figure out what NIC to use? What about rescue images? Or PXE boot through OS install lifetime consistency? Or the chassis swap capability mentioned earlier? How do you reduce human error from replacing failed NICs? We've been discussing this with HP/Dell ... they're working on longer term solutions that are enterprise-ready. This is a serious issue that has languished for years without resolution, that will make Linux on x86 more enterprise friendly. In the mean time pci=bfsort seems to have arisen from this need for something better than hard coding MAC addresses. The short term solution seems to be to add this to the bfsort whitelist until a real solution can be found.
The modules are initialized in parallel - other than changing the timing, bfsort will not help significantly.
These are all workarounds for a prob that has tended to bite us between major releases, but bites us now between 5.1 and 5.2beta. From our limited testing, bfsort is essentially the least evil workaround. We are concerned about timing issues, but would appreciate some more details. We do want the *long term* solution to be better. If the list of models listed here under bfsort support: http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Release_Notes/RELEASE-NOTES-U2-x86_64-en.html is expanded to include one more model, IMHO it will get us to a happier place in the short term until a better solution is found. Please check with HP on this one as well, to confirm if they see any issue with this. So far my conversations with them suggest it should apply to this model of hardware and the limited tests we've done so far indicate this is true.
I've been following some of what Dell wants to do for some of this and I find it interesting. I also toyed with the idea of something that will create some udev rules during install that could then take effect when the system boots for the first time after installation. This might be a nice way to use pci device ordering as a clear way to order devices after an installation. It may not solve the issue of device ordering when swapping hard drives if the udev rules contain mac addresses, but that could be considered when trying to design something that will create rules.
The appropriate design is still being debated; given the 5.2 release schedule, we're going to have to address this in an EUS or ASYNC.
We need to release note this for 5.2
When I say timing issues: - udev, on start, emits uevents for all the hardware in the system, in pci bus order (IIRC) - these events are handled in parallel - Hence, the modules that are loaded to handle these events can race against each other on initialization For many modules, they do not have significant delays on startup, so it seems 'more or less' consistent. However, if the modules do have some sort of delay in their initialization (for example, if they load firmware), then the order can change from boot to boot.
To continue, this is why we always set up networking configurations with HWADDR, so that even if the devices show up 'out of order', they can be renamed so that you have a consistent policy. If you remove HWADDR, it is really best to have *some* other mechanism of determining this (iftab, udev rules, etc.)
We have the same problem here, but even if we set HWADDR the inferfaces get mixed up at each reboot. In our case we have the via-rhine module for eth0 and the sundance module for a 4-port network card, i.e. eth1 - eth4. Also, the problem seems to be somehow related to the init scripts and not to the kernel as the same problem now occurs if we go back to the last 5.1 kernel.
Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes.
Hi, Just thought of updating the Red Hat Engineering guys on this issue. The nic enumaration problem is seen and confirmed on HP BL680C RHEL5u1 and in agospoda lastest test kernel-2.6.18-115.el5.gtest.56.x86_64.rpm The system rebooted 182 times in a period of 14 hours (slot for testing I had), two paterns were seen: NIC Specs on this box (I could provide a sosreport if needed too, just let me know and ready to help/test on this hardware): 1 x Mezz BCM5708s (QUAD onboard so kernel sees 4 cards) 2 x PCIs BCM5715s Parterns seen: a) 178 times eth0-eth3 (BCM5708s) eth4-eth5 (BCM5715s) b) 4 times eth4-eth5 (BCM5715s) 178 eth0-eth3 (BCM5708s) As I said above, I am ready to test anything agospoda has and ready to provide quick feedback. I have some slots available for testing but I do not own the box so the quicker the better... travellig
Hi I think the same problem with the same hardware configuration happens also in Fedora, there are people working on it there. https://bugzilla.redhat.com/show_bug.cgi?id=408891 The guy from Broadcom pointed out a link, I am not sure if it is relevant, but maybe it can help. Thanks to all for working on the problem Dario
#408891 is completely different, but thanks for looking.
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
Just to clarify: - This is not a regression in any new update release - all RHEL 5 releases behave the same way - In the default way that we configure devices (with HWADDR in the ifcfg file), customers will see consistent device names across reboots. If they don't like which devices get which names, they can edit their configuration accordingly. - If they remove the HWADDR line from the ifcfg file, they are likely to see inconsistent device names across reboots In the next major release, udev persistent names will be used so that even without HWADDR there will be consistent names across reboots. However, that change requires changes to udev, anaconda, kudzu, and initscripts at a minimum, and is not really feasible to backport to RHEL 5 at this time.
Hi, I did some tests with udev and it worked nicely here. I have three network cards, so I wrote these udev rules below: # cat /etc/udev/rules.d/99-ethernet.rules KERNEL=="eth*", ID=="0000:05:05.0", NAME="eth0" KERNEL=="eth*", ID=="0000:0b:00.0", NAME="eth1" KERNEL=="eth*", ID=="0000:0c:02.0", NAME="eth2" That renames the interface based on PCI slot, so if you replace the NIC board with another one, the interface name will remain the same. It cames back correctly after few reboots. It also worked swapping slots of two boards. No HWADDR in configs. I think they have all systems of the same models, so the PCI ID would be the same then they can work around writing those rules in %post section of a generic kickstart config. Is this acceptable? thanks, Flavio Internal Status set to 'Waiting on Support' This event sent from IssueTracker by fleitner issue 224660
As a local workaround, sure. Obviously we can't genericize that across hardware and releases.
Hi, We have installed 5u1 on a BL460c. This has an internal NIC (BCM5708S) and a mezzanine quad port card (BL5715S). The MAC addresses are all set in the ipcfg-ethx files via the HWADDR field. We get changes to the NIC numbering (eth0 - eth5) on reboot which is not consistant irrespective of the existance of HWADDR. As suggested (#37) I created a udev script which appears to resolve this.
As many are aware by now, we're solving this in Fedora 15 using biosdevname (latest source at http://linux.dell.com/cgi-bin/gitweb/gitweb.cgi?p=biosdevname.git;a=summary ). This is way too intrusive to add to RHEL5, but there's an older copy in epel5, and I'll get a build out hopefully late this week that includes most recent code. Thanks, Matt
Matt, Thanks again for your work on this, as well as others involved. This is an important part of making our enterprise Linux environment more robust. -- Vinod