Bug 187878

Summary: NICs changing between boots
Product: [Fedora] Fedora Reporter: Daniel Qarras <dqarras>
Component: initscriptsAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: cwei, hassankhaleghi, jarod, mjs, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-04-04 18:36:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg after 1st boot
none
dmesg after 2nd boot none

Description Daniel Qarras 2006-04-04 09:02:58 UTC
Description of problem:
I have a Dell OptiPlex GX260 with an additional NIC. When I boot FC5 eth0 and
eth1 are chosen randomnly, i.e., eth0 may be eth1 after the next boot.

Version-Release number of selected component (if applicable):
FC5 both kernels released so far.

How reproducible:
Sometimes.

Steps to Reproduce:
1. Install FC5 to Dell OptiPlex GX260 with an additional RTL8139 NIC
2. Boot
3. Notice how eth[01] are assigned randomly
  
Actual results:
eth[01] are assigned randomly to NICs.

Expected results:
NICs always have the same identifier.

Additional info:
I'll attach dmesg output after boots when NICs were identified differently. My
/etc/modprobe.conf looks like this:

alias eth0 e1000
alias eth1 8139too

And ifcfg-eth0 is like:

DEVICE=eth0
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.1.6
NETMASK=255.255.255.0

And ifcfg-eth1 is like:

DEVICE=eth1
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.0.0.1
NETMASK=255.0.0.0

Comment 1 Daniel Qarras 2006-04-04 09:04:02 UTC
Created attachment 127281 [details]
dmesg after 1st boot

Comment 2 Daniel Qarras 2006-04-04 09:04:51 UTC
Created attachment 127282 [details]
dmesg after 2nd boot

Comment 3 Jarod Wilson 2006-04-04 18:36:51 UTC
Use the additional HWADDR parameter in your ifcfg-ethX files to assign ethX to a
NIC via its MAC address. Ex:

DEVICE=eth0
BOOTPROTO=dhcp
HWADDR=00:0F:EA:35:3E:6A
ONBOOT=yes

On clean installs, this is done by default to avoid issues like this, which are
typically more common on laptops w/pcmcia cards, but do happen on desktop
systems as well. HAL/udev just don't always find things in the same order, for
whatever reason.

Comment 4 Dave Jones 2006-04-04 18:43:13 UTC
*** Bug 187938 has been marked as a duplicate of this bug. ***

Comment 5 Matthew Saltzman 2006-04-04 22:39:09 UTC
I can't reopen this bug, but I can report that having HWADDR in ifcfg-ethX does
not guarantee that the NICs will be selected in the proper order.

In fact, I have the following in /etc/modprobe.conf on my Thinkpad T41:

alias eth0 e1000
alias eth1 ipw2200
install ipw2200 /sbin/modprobe -q eth0; /sbin/modprobe --ignore-install ipw2200

That *should* force the e1000 to load first no matter what, but even with this,
I've had them load in reverse.

I *never* had this problem with FC4 kernels.  The detection order changed from
2.4 kernels to 2.6, but within 2.6, they always loaded consistently (wireless
first, FWIW) unless forced as above, until FC5.

This *is* a bug, and an extremely annoying one in a laptop as DHCP lease files
are tied to interfaces, so if you swap interfaces, you screw up DHCP completely.
 Please reconsider the disposition.

Comment 6 Jarod Wilson 2006-04-05 02:11:29 UTC
Hrm, some refactoring of what was where took place in FC5, but the HWADDR
parameter *should* still be honored. Rumor now has it this may have been fixed
in the updates-testing initscripts package. Please give that version a shot and
see if it doesn't fix the problem. If not, I'll reopen this bug. If it does,
looks like we should push that package to updates.

Comment 7 Matthew Saltzman 2006-04-05 02:54:41 UTC
Did you mean module-init-tools?  I've installed those from updates-testing (but
like a good scientist, I'm only going to change one variable at a time...).
Took out the force line in modprobe.conf.

I;ve rebooted twice and it's worked so far, but give me a day or two to try a
few more times before we start jumping for joy.

Comment 8 Jarod Wilson 2006-04-05 03:08:04 UTC
Nope, I meant the initscripts package, as its what contains the /sbin/ifup and
ifup-eth scripts that are supposed to look at the HWADDR parameter. Could be
something in the updated module-init-tools that helps with stabilizing the load
order though.

Comment 9 Matthew Saltzman 2006-04-05 03:19:54 UTC
Ah, OK.  But on the laptop, I don't actually start any interfaces at boot.  I
use NetworkManager, which only starts interfaces when I log in.  I'm not even
sure if it uses ifup when in scanning mode.  (It does use ifcfg-ethX to
determine if there is a fixed IP address, though.)

I seem to recall in certain circumstances that if the HWADDRs were wrong, I'd
get an error.  That doesn't suggest that HWADDR will force module loading order.
 (Just speculating...)

Comment 10 Daniel Qarras 2006-04-05 08:43:51 UTC
I have both good news and bad news.

Good news: using HWADDR and the updated initscripts RPM my NICs are now getting
their names consistently so far on my Dell desktop.

Bad news (for me, at least): Linux 2.x has ALWAYS found NICs in the exactly same
order without any HWADDR configuration until FC5. For me this changed behavior
is clearly a regression. For instance, I cannot anymore use in-house made
automation scripts that rely being HWADDR independent. My machines do not use
NetworkManager and NIC are identified before HAL starts so this seems like a
udev or a kernel bug. Please consider reassigning the bug to the udev component
if this not kernel's fault. Thanks.

Comment 11 Matthew Saltzman 2006-04-05 14:49:10 UTC
I will confirm that with the new initscripts, the device names are being
assigned consistently.  At least, they have been through a half-dozen or so
boots on my part and a handful on the part of a colleague down the hall.

The initscripts changelog lists a "udev helper to rename netowrk devices on
device creation," which sounds like it's the relevant change.

Comment 12 Jarod Wilson 2006-04-05 15:05:22 UTC
(In reply to comment #10)
> I have both good news and bad news.
> 
> Good news: using HWADDR and the updated initscripts RPM my NICs are now
> getting their names consistently so far on my Dell desktop.

Good to hear.

> Bad news (for me, at least): Linux 2.x has ALWAYS found NICs in the exactly
> same order without any HWADDR configuration until FC5. For me this changed
> behavior is clearly a regression.

There are quite a few systems out there that do not have their NICs found in the
same order, even on much older kernels. Why yours never changed before and does
now, I can't say, but its pretty typical behavior -- consider a system where you
hotplug a interface (eg., a pcmcia card or usb to ethernet adapter), a system
where you move a NIC from one PCI slot to another for some reason, or a system
with a BIOS that allows you to alter the bus scanning order. The HWADDR
parameter is there and configured by default because these scenarios are so common.

> For instance, I cannot anymore use in-house made
> automation scripts that rely being HWADDR independent. My machines do not use
> NetworkManager and NIC are identified before HAL starts so this seems like a
> udev or a kernel bug. Please consider reassigning the bug to the udev
> component if this not kernel's fault. Thanks.

I'll leave this at Dave's discretion. So far as I can see, everything is
operating as generally expected. For some reason, devices in your particular
machine are getting picked up in a different order than before, but relying on
order in which things are discovered to assign a device number and IP address to
them is hazardous at best, in the majority of cases.


Comment 13 Jarod Wilson 2006-04-05 15:11:46 UTC
(In reply to comment #11)
> I will confirm that with the new initscripts, the device names are being
> assigned consistently.  At least, they have been through a half-dozen or so
> boots on my part and a handful on the part of a colleague down the hall.
> 
> The initscripts changelog lists a "udev helper to rename netowrk devices on
> device creation," which sounds like it's the relevant change.

Two positive endorsements there, along with my own from running that initscripts
package on a handful of systems for two weeks now. I'll ping folks here to get
that package bumped from updates-testing to a full-blown released update.

Comment 14 Matthew Saltzman 2006-04-21 18:45:48 UTC
Is there something holding back the release of the updates-testing initscripts?
 It's been two weeks since Comment #13 and over a month since the update appeared.

In all that time, I have not seen a problem with NIC reordering while using it.

Thanks

Comment 15 Daniel Qarras 2006-05-29 08:59:02 UTC
So what's the status with this? All information needed should be available to
roll out a package containing needed fixes. Thanks.

Comment 16 Matthew Saltzman 2006-06-27 23:06:07 UTC
OK it's been three *months* since the new initscripts appeared in
updates-testing.  Is there any reason at all not to push it out to updates?

Apparently the move to kernel-2.6.17 has caused NIC reordering up the wazoo.  It
would solve so many people's issues if they had this installed.

For love of all that's Linux, can we please roll it out now?

Thanks!

Comment 17 Jarod Wilson 2006-06-28 18:53:08 UTC
Ack, this got lost in the shuffle when I moved across the country and I somehow
missed putting myself on the CC list. I've added myself and am currently bugging
people about getting that update moved over to updates-released.

Comment 18 Jarod Wilson 2006-06-28 19:15:21 UTC
Okay, there were some reasonably big changes to the initscripts in updates-testing:

-------
initscripts-8.31.2 adds a udev helper for renaming devices,
so that devices are renamed to their configured name on
module load, as opposed to when they are brought up.

Since this is adding new code to the boot path, it could use
a good deal of testing; it will be pushed final once I'm
comfortable that there are no regressions.
-------

That's part of the reason for the delay. I'm changing the component for this bug
over to initscripts, we should have some traction on getting this moved shortly.


Comment 19 Jarod Wilson 2006-07-10 15:28:19 UTC
*** Bug 198037 has been marked as a duplicate of this bug. ***