From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20040920 Description of problem: First off, I don't know that this is a hotplug problem, but it seems the best place to initially assign this. Note that I also have bug 155522 open, which describes possibly-related problems that seem most closely related to system-config-network-tui. The basic problem is as described in the summary: Initially my second network interface is named e.g. "dev17447" instead of "eth1". On the first machine I saw this on (the one that was the basis for bug 155522), I was able to countermand the device name; I forgot exactly how I did it, but I now have eth0-eth2 correctly named on that machine. But I have a second machine (which is the basis for the present bug report) where the system insists on keeping a devNNNNN name for eth1. I am at a loss as to why this happens and how to change it. The interface works, but the fact that it is named so weirdly is disturbing. This machine is a Dell PE2650. On the back panel are three network ports: GigE ports 1 and 2, and a third port that has a wrench icon next to it, which is devoted to the embedded remote access controller. I assume but do not know for certain that the OS will never see that third port. The history is a bit confused, but I will relate what I think are the salient points. Both upon initial configuration and upon reconfiguration after removing all traces of the network interfaces from the various files under /etc, eth0 was assigned to the *second* port (second in terms of both label on the back panel, MAC address, and PCI device number on that PCI bus). The first port (judging by back-panel label, and the MAC address and PCI device number, both of which are *two* less than eth0) was assigned a devNNNNN name, instead of eth1. The initial name was dev17447, and now (upon complete reconfiguration), it's dev31792. It's possible the numbers are PIDs; I didn't check if that was plausible at the time. When I say that we "removed all traces of the network interfaces from the various files under /etc", I mean that we: * removed all of the ifcfg-<ethernet interface name> files under network-scripts/, as well as under networking/ * removed the ethernet device descriptors from sysconfig/hwconf * remove the aliases from /etc/modprobe.conf It was immediately after this that the NNNNN changed, which indicates to me that we were successful in eradicating all traces of the old name, but it got regenerated. I did a test that Bill Nottingham suggested in bug 153669 (at least, the following is my interpretation of his suggestion): sysctl -w kernel.hotplug="/bin/true" service network stop rmmod tg3 (both eth0 & eth1 use this driver) modprobe tg3 ifconfig -a This time, eth0 was the earlier MAC address, and eth1 existed as the second MAC address, as expected. So is it hotplug's problem that eth1 gets the wrong name? Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: 1. initialize the network interfaces on a box that causes this problem :) 2. 3. Actual Results: eth1 was initially named dev17447 instead of eth1 Expected Results: eth1 should be named eth1 Additional info: This machine is a Dell PE2650. On the back panel are three network ports: GigE ports 1 and 2, and a third port that has a wrench icon next to it, which is devoted to the embedded remote access controller. I assume but do not know for certain that the OS will never see that third port. The history of my encounter with this problem is a bit confused, but I will relate what I think are the salient points. Both upon initial configuration and upon reconfiguration after removing all traces of the network interfaces from the various files under /etc, eth0 was assigned to the *second* port (second in terms of label on the back panel, MAC address, and PCI device number on that PCI bus). The first port (judging by back-panel label, and the MAC address and PCI device number, both of which are *two* less than eth0 above) was assigned a devNNNNN name, instead of eth1. The initial name was dev17447, and now (upon complete reconfiguration), it's dev31792. It's possible the numbers are PIDs; I didn't check if that was plausible at the time. If desired, I could do this check. When I say that we "removed all traces of the network interfaces from the various files under /etc", I mean that we: * removed all of the ifcfg-<ethernet interface name> files under network-scripts/, as well as under networking/ * removed the ethernet device descriptors from sysconfig/hwconf * remove the aliases from /etc/modprobe.conf * I think after all the above we rebooted, but I'm not sure It was immediately after this that the NNNNN changed, which indicates to me that we were successful in eradicating all traces of the old name, but it got regenerated using the same algorithm but a different PID or something. I did a test that Bill Nottingham suggested in bug 153669 (at least, the following is my interpretation of his suggestion): sysctl -w kernel.hotplug="/bin/true" service network stop rmmod tg3 (both eth0 & eth1 use this driver) modprobe tg3 ifconfig -a This time, eth0 was the first MAC address, and eth1 existed as the second MAC address, as expected. There was no devNNNNN device. So is it hotplug's problem that eth1 gets the wrong name?
Line breaks, please. :) What's your /etc/sysconfig/hwconf and /etc/sysconfig/network-scripts/ifcfg-eth* look like?
Created attachment 113869 [details] /etc/sysconfig/hwconf from machine with devNNNNN interface
Created attachment 113870 [details] /etc/sysconfig/network-scripts/ifcfg-eth0 from same machine
Created attachment 113871 [details] /etc/sysconfig/network-scripts/ifcfg-dev31792 from same machine
Regarding line breaks, yeah I know. :) It's really ugly. I'm on FC1 under KDE and usually use Konqueror. Konq breaks text submitted in a web form, but it sometimes hangs on the RH BZ site. Using Moz works doesn't hang, but it doesn't break lines either. I'll try to remember to insert breaks manually. Files attached above.
Interesting. Does it end up that way directly after install? Is there an alias devXXXXX tg3 in /etc/modprobe.conf? When you 'remove all traces', how are you then reconfiguring it?
The history is a bit murky, but I think it was that way essentially right after install. There is an alias for devNNNNN in modprobe.conf, yes. I reconfigured with "kudzu -b pci" and setting the "newly-discovered" interfaces to dhcp for ease. Hmmm, if the kudzu interface-setting code shares with s-c-network-tui, that might be related to my other experiences in bug 155522. It sure *looks* like the same interface. I'd assume it's just me, since no one else has reported this a couple of months after RHEL4 came out, but I've seen the s-c-n-t problems on two different sets of hardware now. In both cases, the initial install was booting from boot.iso and manually selecting server-related package groups, so no funky kickstart or anything.
I don't think this is relevant, but the one non-standard thing I did on both machines was to install yum from the Duke site, with /etc/yum.repos.d set up with two files: [dag.wieers.com] name=Red Hat Enterprise Linux $releasever - $basearch - dag.wieers.com baseurl=http://toughguy.caltech.edu/pub/caltech/dag.wieers.com/redhat/el4/en/$basearch/dag/ enabled=1 gpgcheck=1 [base] name=Red Hat Enterprise Linux $releasever - $basearch - Base baseurl=http://aeolis.gps.caltech.edu/rpm/RHEL/$releasever/os/$basearch/RedHat/RPMS enabled=1 gpgcheck=1 To alleviate your possible concerns about having Dag in there, on the persistent-devNNNNN machine I did: rpm -qa --qf "%{name}\t%{BUILDHOST}\n" |grep -v redhat |less and only yum, a gpg-pubkey, and the ssl cert for our RHN proxy setup turned up -- so no significant non-RedHat packages on this machine. And I'm quite sure I didn't install any non-RHEL4 but RedHat-built packages either -- I always get the source & rebuild if the binary rpms aren't for my OS version. The first machine I saw the devNNNNN on *does* have some customized things: a kernel that is the latest RHEL4 kernel except that the XFS filesystem is enabled, plus some helper packages for XFS and bonnie++ and iozone. I don't see that any of those changes would effect the network problem I'm seeing. And given that the almost-completely-standard machine sees the same problems but worse, I don't think local customizations are an issue. We have an academic site license, so no formal support. I'm mostly reporting this so you can check whether it is indeed a problem that others will face. I'd rather my interface was called "eth1", but if I have to, I can live with dev31792. :) Thanks for checking it out!
The reason it's devXXXX is because it was eth0 when you tried to bring up eth0; so it was renamed out of the way. However, it should never be configured in that state, and I'm not sure how it got there. Just a s/devXXXXX/eth1/ on your ifcfg-*, hwconf, and modprobe.conf files should fix it fine.
"The reason it's devXXXX is because it was eth0 when you tried to bring up eth0; so it was renamed out of the way." This sounds like a Zen koan. :) In what sense was it eth0? And if it was eth0, then why didn't it come up when I tried to bring up eth0? And what is the "when" in your phrase, "when you tried to bring up eth0"? I don't want to take up your time if I'm the only one seeing this, but I'd like to understand how to avoid this, since I've seen it on two machines. I'd rather it not happen again. We're working toward an automatic (re-)installation and configuration system, and I'd hate to have to deal with devNNNNN in such a system.
When the module loaded, 00:11:43:35:AC:70 was eth0, and 00:11:43:35:AC:72 was eth1. You said to bring up 'eth0'; ifup looked at the config file, saw that that was 00:11:43:35:AC:72 in the config file. So it renamed 00:11:43:35:AC:70 to devXXXXX (the XXXXX is a random number.), renamed 00:11:43:35:AC:72 to eth0, and then brought up eth0. This is all done in the scripts for ifup; ergo, it's never seen this way in anaconda, so anaconda shouldn't be able to configure interfaces as devXXXXX.
Thanks, Bill. I'm still not 100% clear on how exactly this can happen, but that's OK; I have a general idea. Two last comments: 1) I strongly suspect my use of s-c-network-tui is the culprit in messing things up on both machines. That tool needs a major reworking, both when run as s-c-n-t and as kudzu's network config interface. E.g. it should be able to configure a newly-discovered hardware interface as ONBOOT=no, and if you choose "no", it shouldn't demand that you configure the dhcp/static info. Also, I'm not sure whether s-c-n-t is supposed to reconfig already-present interfaces, or *only* config newly-discovered ones. But it screws up when I try to use it to reconfig already-present interfaces; that could well be the source of the problem reported in this bug. Also see bug 155522. You can reassign this bug to system-config-network if you like. 2) I just now succeeded in getting my two interfaces configured the way I want. Yes, I could have done it completely manually, but I wanted to see what the tools would do. I did: service network stop [remove all ethernet interface aliases from modprobe.conf] [remove both NETWORK stanzas from hwconf] [remove ifcfg-eth0 and ifcfg-dev31792 files *everywhere*] grep -r eth0 /etc |less [to make sure no eth0 lingererd anywhere] ifconfig -a [still showed eth0 and dev31792] rmmod tg3 ifconfig -a [eth0 and dev31792 now gone] modprobe tg3 ifconfig -a [eth0 and eth1 shown in proper MAC order; no devNNNNN] kudzu -b pci [configured 1st-presented interface as static, 2nd as dhcp] service network start ifconfig -a [MACs correct ordering, but eth1 is static, eth0 dhcp] service network stop [manually edit ifcfg-eth{0,1} to swap the static/dynamic assignments] [edit ifcfg-eth1 to ONBOOT=no] service network start Now, finally, I have eth0 as a chosen static address, eth1 off, and no devNNNNN device. Why does kudzu present the interfaces in reverse order? It should be easier than this, no? Yes, people who can run s-c-n-gui can do everything in a nice data-entry form, but text-bound people like me have no choice but to do lots of hand edits, which requires knowing exactly which files are which. I'll leave you with the thought that s-c-n-t should be redesigned, and kudzu should present interfaces in order. I'm done now. :) Thanks again.
As for 1), kudzu doesn't use s-c-network-tui. #2 - you're saying kudzu configured eth1 first? Not sure why. Probably just the order it happened to be on the PCI bus.
1): OK, I see now that you're correct that the text user interfaces are a bit different, so must not use the same code. 2): The PCI device order is eth0 first -- they're on the same PCI bus (bus 3 on this box), and eth0 is device 6; eth1 is device 8. The tg3 driver named them in the correct order; kudzu didn't present them in the correct order. There's no way within kudzu for the user to know that the order was reversed (the vendor device name is the same for both devices, and eth0/eth1 is not shown). You have to fix it afterwards, manually. *If* you discover it and know which files to tweak. It would be interesting to see what would happen if I hadn't modprobe'd tg3 *before* running kudzu, but just ran kudzu -- would they have been reversed not only in kudzu's presentation order but in ethN-MAC matching? I maintain that kudzu at least violates the principle of least surprise. :) Shall I file a bug?
Sure.
For what's it worth: Same thing could be observed on my notebook when fiddling with kudzu, deleting the kudzu hwdatabase of already known hardware and then running kudzu again. On the next reboot, eth0 (the wired interface) was named devXXXX. Nothing a quick edit of /dev/modprobe.conf didn't fix, but it was a bit unexpected though.
This, in general, should be better with the kudzu, hotplug, initscripts from the U2 beta channel. Please reopen this if you have a specific sequence of events that can reproduce this; without that, it's very hard to determine where the error is that needs fixed.