If network driver modules get reloaded in the wrong order so that the device which should be 'eth0' is actually 'eth1', we rename the devices. Unfortunately, this can result in the device which _was_ 'eth0' ending up with a name like '__tmp1804289383'. After we're done renaming, we shouldn't leave it with a device name like that -- we should put it back to eth%d. NetworkManager and dhclient both fail on devices like '__tmp1804289383': Oct 1 11:56:41 pegasos dhclient: Bind socket to interface: No such device Renaming it with one fewer digit (i.e. '__tmp180428938') makes it work with dhclient, although hal and NetworkManager seem to get utterly confused when we rename devices -- I tried renaming it back to 'eth3' and NetworkManager was still in a loop eating 100% CPU trying to do SIOCGIFFLAGS on a device starting '__tmp180428938...' and ending in complete garbage.
kudzu (should) clean this up for any unconfigured interfaces on boot. Do you have it installed and enabled?
No, kudzu was disabled. Will undo the configuration change and try again with kudzu enabled.
Yes, that makes it work.
... sometimes. I just saw it again even with kudzu running. I've put back my ifcfg-eth1 for now.
So, kudzu only oes this the first time it sees a device, as it changes the name and then writes a config file. After that, it assumes the config file is still there (or changed by s-c-network if the user wanted it changed.)
Created attachment 141998 [details] 1. patch to increase verbosity
Created attachment 142000 [details] 2. patch, handle renaming of temporary devices (sorry, patch "141998: 1. patch to increase verbosity" belongs to this one. i just figured out that i failed to transmit both patches together in one note. apply both patches on top of each other!) this patch renames any __tmpXXX devices to the devicenames that are specified in the configuration. i cannot rename these temporary names back to their original names if they where never configured. their names are already occupied by configured devices. my solution for this was to generate a new device name that can also used later on. (but it is not recommended!) new devices that were never configured but have to be renamed because of a devicename clash are now called newnetXXXXXX. the 6 digits are chosen semirandomly by calling srand() to prevent clashes of newnet names. this change solves any __tmp1804289383 problems that are floating around that i can think of (+ bugs #214817 #209009 #210780). btw: this bug appears in FC6 now. the patches apply against initscripts-8.45.5-1 version of this distribution. for production uses you might want reduce the verbosity of my patch. regards, wilfried
So, for patch #1, you're logging in a situation where there's no logging daemon. Not sure how well that helps. As to replacing one particular temporary name with another, I don't think that's particularly useful. You may want to look at the current updates/updates-testing initscripts and kudzu - this should fix it so that config files are always written for new devices, and any __tmp devices are renamed there. In the meantime, any machine bitten by this might need to manually add a configuration.
I just installed initscripts-8.45.6-1 and kudzu-1.2.57.3-1 from updates-testing but the problem remains. About logging: With LOG_CONS I also log to the console. But that was just for debugging and illustrating the problem. About renaming: I only rename to "newnet" if no name for that device was found in the configuration. With the patch applied after a Fedora install you will not have a "newnet" device because all network devices have a configured names and you get properly named devices. Without the patch you have a "__tmp" device if network device names must be exchanged no matter if the names were configured or not. With the patch you only end up with a "newnet" device if you add new hardware and the new device names are in conflict with the already configured devices. A better name would be nice, but I was not sure if stripping off the digits at the end of the original device name and at the end of rename_device append the next free number to that name does the job. (Thinking about vlan, tuntap, and inconsistent naming conventions in wlan drivers, ...) It is far from perfect but it is some progress (but it fixed my problem so i posted it to bugzilla). Regards, Wilfried
Which specific problem remains? If it's just that you have __tmpXXX instead of newnet*, I'm not sure that's a useful change - it's just exchanging one temporary name for another.
(In reply to comment #10) > Which specific problem remains? > > If it's just that you have __tmpXXX instead of newnet*, I'm not sure that's a > useful change - it's just exchanging one temporary name for another. As long as the new name is one character shorter, it means that dhclient and NetworkManager should no longer crap themselves :)
Well, that's silly. They should be fixed. :)
To clarify my statement: initscripts-8.45.6-1 and kudzu-1.2.57.3-1 from updates-testing do not fix this bug. I still get __tmpXXX devices. I have 4 ethernet devices: 1x 3Com (3c59x) 2x Realtek (8139too) 1x VIA Rhine (via-rhine) They are detected by anaconda in this order: 3com => eth0 1. Realtek => eth1 2. Realtek => eth2 VIA Rhine => eth3 The ifcfg-* files are written accordingly. No old bak files or rpmsave or whatever got in the way (I think this was some other bug about that but this one did not byte me.). This is fresh install. One of these devices always gets a __tmpXXX name with original FC6 install or with the kudzu and initscripts rpm in updates/testing. The reason is that the order in which the modules are loaded is different when the installed system boots and the renaming that has to be done to get consistent device names does a "loop". Something like: before udev magic => after udev magic: -------------------------------------- eth0 => eth1 eth1 => eth2 eth2 => eth3 eth3 => eth0 rename_device fails in this case and leaves a __tmpXXX device. My patch fixes this case. With my patch applied I get all ethX devices with their proper names. That is: eth[0-3]. No newnet in this case. I would get newnet devices (and only then!) when I add a new ethernet device and the devicename of the new device is inside the renaming "loop". In this case this would mean that if the new device is eth4 after the module load then it is not part of the loop and we have no newnet device. Otherwise we have. I agree which you that this is not so great. I could include in my patch something like this instead of the "newnet" part if you want: rename_new_unconfigured_dev(newdevice) { fixedname=truncate_trailing_digits(newdevice); for(i=0;1;i++) { if(device_exists(fixedname+itoa(i) == false) { fixedname=fixedname+itoa(i); return fixedname; } } } This way we would always have proper names. I cannot think of a case where this would break. Maybe there is a weird case with vlans, bonding or whatever. I have not done complex network configuration on Redhat servers. You people know better.
Created attachment 142340 [details] a slightly different patch for this So, in parallel I was working on a I-thought-unrelated issue that turned out to be exactly this (doh). Here's what I'm using now, which is a similar-idea-but-different patch approach. I've integrated the srand() - that was obviously missing in general. Some comments, which explain the logic in this patch: 1) we should already know which temporary devices we want to rename, and what to rename them to, because those would be ones where we had to not rename them because they conflicted with a device already in the chain - this eliminates the need to re-look up hwaddrs in the rename-of-temporary-devices loop 2) the reason I keep using the __tmpXXXXX name is because that is used by kudzu to find newly added temporary devices (which it then renames to something sane when it writes configs for them Does this patch work for you?
Added in 8.48-1, 8.45.7-1.
I applied your patch against 8.45.6-1. I could not find 8.45.7-1 in the repository. All network devices have correct names now. I simulated new devices by removing entries in kudzu's hwconf and the corresponding files in network-scripts. This works too. The new devices are configured with dhcp. This one works! :)
initscripts-8.45.7-1 has been pushed for fc6, which should resolve this issue. If these problems are still present in this version, then please make note of it in this bug report.