Bug 911012

Summary: Race condition with driver init causes missing network interfaces.
Product: [Fedora] Fedora Reporter: Rudolf E. Steiner <res-1>
Component: biosdevnameAssignee: Praveen K Paladugu <praveen_paladugu>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 17CC: felix, harald, jonathan, jordan_hargrave, kdudka, kerframil, matt_domsch, mebrown, praveen_paladugu, res-1, udev-maint, vpavlin, wd
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 782145 Environment:
Last Closed: 2013-02-14 22:14:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rudolf E. Steiner 2013-02-14 09:12:55 UTC
+++ This bug was initially created as a clone of Bug #782145 +++

Description of problem:

I have a system with 2 network ports on the mainboard and 4 more
ports on a quad network adapter card.  Depending on the timing, udev
and it's renaming orgy runs into race conditions with the driver init
code, which can cause "lost" interfaces.

Version-Release number of selected component (if applicable):

udev-173-3.fc16.x86_64
kernel-3.1.6-1.fc16.x86_64


How reproducible:

Race condition. Happened once out of 4 boot sequences so far...

Steps to Reproduce:
1. Reboot the system.
  
Actual results:

Only file network interfaces were visible:
- eth0 and eth1 for the interfaces on the mainboard
- eth2, eth3 and eth4 for interfaces on the quad card
- eth5 was missing

Expected results:

- eth 0 and eth 1 on the mainboard
- eth2, eth3, eth4 and eth5 on the quad card.

Additional info:

Closer inspection showded that the missing interface was present,
although under the unexpected name "rename7"; I was able to bring it
up as needed using a "ip link set rename7 name eth5 ; ifup eth5"
command sequence.

The following extract from the system log shows how the kernel and
udev step on each other's feet when naming / renaming interfaces:

[   11.140816] igb 0000:06:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 90:e2:ba:02:be:54
[   11.149784] igb 0000:06:00.0: eth0: PBA No: E91609-005
[   11.170289] e1000e 0000:09:00.0: eth1: (PCI Express:2.5GT/s:Width x1) 00:30:48:d5:7b:2c
[   11.170291] e1000e 0000:09:00.0: eth1: Intel(R) PRO/1000 Network Connection
[   11.170368] e1000e 0000:09:00.0: eth1: MAC: 3, PHY: 8, PBA No: 0101FF-0FF
[   11.321775] e1000e 0000:0a:00.0: eth2: (PCI Express:2.5GT/s:Width x1) 00:30:48:d5:7b:2d
[   11.330970] e1000e 0000:0a:00.0: eth2: Intel(R) PRO/1000 Network Connection
[   11.339630] e1000e 0000:0a:00.0: eth2: MAC: 3, PHY: 8, PBA No: 0101FF-0FF
[   11.353905] udevd[710]: renamed network interface eth0 to rename2
[   11.365854] udevd[724]: renamed network interface eth2 to rename4
[   11.381840] udevd[712]: renamed network interface eth1 to eth0
[   11.415701] udevd[710]: renamed network interface rename2 to eth2
[   11.415922] igb 0000:06:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 90:e2:ba:02:be:55
[   11.416004] igb 0000:06:00.1: eth1: PBA No: E91609-005
[   11.616251] igb 0000:07:00.0: eth4: (PCIe:2.5Gb/s:Width x4) 90:e2:ba:02:be:56
[   11.624478] igb 0000:07:00.0: eth4: PBA No: E91609-005
[   11.658062] udevd[709]: renamed network interface eth3 to eth4
[   11.861296] igb 0000:07:00.1: rename7: (PCIe:2.5Gb/s:Width x4) 90:e2:ba:02:be:57
[   11.870485] igb 0000:07:00.1: rename7: PBA No: E91609-005
[   11.904426] udevd[709]: renamed network interface eth3 to rename7
[   11.922283] udevd[712]: renamed network interface eth1 to eth3
[   11.966126] udevd[724]: renamed network interface rename4 to eth1
[  102.021874] udevd[709]: error changing net interface name rename7 to eth4: File exists
[  106.506673] bonding: bond0: Adding slave eth2.
[  106.595914] bonding: bond0: enslaving eth2 as a backup interface with a down link.
[  106.657064] bonding: bond0: Adding slave eth3.
[  106.746950] bonding: bond0: enslaving eth3 as a backup interface with a down link.
[  106.794387] bonding: bond0: Adding slave eth4.
[  106.885032] bonding: bond0: enslaving eth4 as a backup interface with a down link.
[  106.996730] network[1339]: Bringing up interface bond0:r
ERROR    : [/etc/sysconfig/network-scripts/ifup-eth] Device eth5 does not seem to be present, delaying initialization.
[  108.901785] igb: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  108.919273] bonding: bond0: link status definitely up for interface eth3, 1000 Mbps full duplex.
[  109.050373] igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  109.060950] igb: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  109.129793] bonding: bond0: link status definitely up for interface eth2, 1000 Mbps full duplex.
[  109.140443] bonding: bond0: link status definitely up for interface eth4, 1000 Mbps full duplex.

Maybe it would be better if udev waited until all network drivers
have completed their initialization sequence, before it starts
renaming?

Also, I wonder why the Intel network driver chooses "rename7" as
interface name here (see 11.861296 time stamp).

--- Additional comment from Kay Sievers on 2012-01-16 12:03:02 EST ---

The automatic udev network renaming is removed in rawhide already, and will
not exist in future releases. It causes more problems than it solves.

I doubt the problems in earlier releases can or will ever be fixed properly.

Sorry for the mess, if you need predictable names, please edit the rules
file to use other names as the the kernel names (not ethX) to name the devices.
We can not operate in the same namespace as the kernel and expect it to work.

--- Additional comment from  on 2012-06-13 17:20:23 EDT ---

The network renaming feature is proving troublesome for certain deployment scenarios, with the effects extending beyond Fedora. Kay, would you be so kind as to you advise as to which version of udev no longer contains this feature - or perhaps point to the relevant commit hash?

--- Additional comment from Kay Sievers on 2012-06-13 19:53:26 EDT ---

It was disabled in Fedora 17, and Fedora 18 will have systemd's udev which
will not provide the old and racy renaming logic triggered by udev.

Use biosdevname, or HWADDR= in the sysconfig scripts, or write your own udev
rules which rename the devices. But better never try to use the ethX or any
other kernel namespace, name the devices after their function like internal,
dmz, or whatever fits, but trying to keep ethX stable can never work reliably,
and will not be supported by any future tool.

--- Additional comment from Rudolf E. Steiner on 2013-01-14 18:52:56 EST ---

Kay Sievers wrote:

> It was disabled in Fedora 17

I have the same issue on Fedora 17.

--- Additional comment from Kay Sievers on 2013-01-15 12:01:04 EST ---

(In reply to comment #4)
> Kay Sievers wrote:
> 
> > It was disabled in Fedora 17
> 
> I have the same issue on Fedora 17.

Sure, you do, if you have old rules files which try to rename kernel-created interface names to other names in the same kernel ethX namespace.

This can all no longer work. The rules file needs to be manually removed,
or edited to contain names other than ethX as target names.

Sorry, this can only be solved manually, there is no way to mess from RPN with
user config.

--- Additional comment from Rudolf E. Steiner on 2013-01-15 12:13:17 EST ---

> Kay Sievers quoted/wrote:

>>> It was disabled in Fedora 17
>> I have the same issue on Fedora 17.
> Sure, you do, if you have old rules files which try to rename kernel-created
> interface names to other names in the same kernel ethX namespace.
> This can all no longer work. The rules file needs to be manually removed,
> or edited to contain names other than ethX as target names.
> Sorry, this can only be solved manually, there is no way to mess from RPN
> with user config.

There is no user config. It is a fresh installed Fedora 17. In the most cases after a boot I have following interfaces:

| em1
| lo
| p2p1
| p2p2
| p2p3
| p2p4
| p4p1

After other reboots I have following interfaces (example):

| em1
| lo
| p2p1
| p2p3
| p2p4
| p4p1
| rename2

--- Additional comment from Kay Sievers on 2013-01-15 12:35:13 EST ---

Hmm, if it's there, what's the content of:
  /etc/udev/rules.d/70-persistent-net.rules
?

If it isn't there, some other rule is trying that, check:
  grep NAME= /etc/udev/rules.d/*.rules /lib/udev/rules.d/*.rules

If there is only 60-net.rules left, check your:
  /etc/sysconfig/network-scripts/ifcfg-*
files, if they contain instructions to rename things to kernel names. These
need to be fixed then, we cannot rename *to* ethX, only *from*.

--- Additional comment from Rudolf E. Steiner on 2013-01-15 13:00:30 EST ---

Kay Sievers wrote:

> Hmm, if it's there, what's the content of:
>   /etc/udev/rules.d/70-persistent-net.rules
> ?

It's not there.

> If it isn't there, some other rule is trying that, check:
>   grep NAME= /etc/udev/rules.d/*.rules /lib/udev/rules.d/*.rules

The result:

| /lib/udev/rules.d/10-dm.rules:KERNEL=="device-mapper", NAME="mapper/control"
| /lib/udev/rules.d/71-biosdevname.rules:NAME=="?*",       GOTO="netdevicename_end"
| /lib/udev/rules.d/71-biosdevname.rules:# using NAME= instead of setting INTERFACE_NAME, so that persistent
| /lib/udev/rules.d/71-biosdevname.rules:PROGRAM="/sbin/biosdevname --policy physical -i %k", NAME="%c",  OPTIONS+="string_escape=replace"

--- Additional comment from Kay Sievers on 2013-01-15 13:40:30 EST ---

What's the output of:
  biosdevname -d
?

--- Additional comment from Rudolf E. Steiner on 2013-01-15 14:03:13 EST ---

Created attachment 678986 [details]
Output of "biosdevname -d"

--- Additional comment from Kay Sievers on 2013-01-15 20:13:03 EST ---

There seem no other sources of device naming on the system than biosdevname.

One possible explanation would be, that biosdevname returns identical names for two different devices.

The debug output looks suspicious, that there are two e1000 devices with consecutive MAC address numbers, but one of them gets an onboard name, and the other one doesn't.

Re-assigning to biosdevname, as it seems to be the only active component here.

--- Additional comment from Fedora End Of Life on 2013-01-16 12:03:44 EST ---

This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

--- Additional comment from Narendra K on 2013-01-21 11:02:04 EST ---

Hi, could you please attach the output of the following from the system 

1. dmidecode
2. biosdecode
3. lspci -tv and lspci -xxxvvv
4. The content of /etc/sysconfig/network-scripts/ifcfg-* (corresponding to comment #10)

The issue is that one of the interfaces is named as 'renameN' across multiple reboots. Is the understanding correct ?

--- Additional comment from Rudolf E. Steiner on 2013-01-21 12:05:47 EST ---

Created attachment 684472 [details]
Output of "dmidecode"

--- Additional comment from Rudolf E. Steiner on 2013-01-21 12:06:17 EST ---

Created attachment 684473 [details]
Output of "biosdecode"

--- Additional comment from Rudolf E. Steiner on 2013-01-21 12:06:56 EST ---

Created attachment 684474 [details]
Output of "lspci -tv"

--- Additional comment from Rudolf E. Steiner on 2013-01-21 12:07:27 EST ---

Created attachment 684475 [details]
Output of "lspci -xxxvvv"

--- Additional comment from Rudolf E. Steiner on 2013-01-21 12:08:49 EST ---

Created attachment 684476 [details]
Content of "/etc/sysconfig/network-scripts/ifcfg-*"

--- Additional comment from Rudolf E. Steiner on 2013-01-21 12:10:16 EST ---

Narendra K. wrote:

> The issue is that one of the interfaces is named as 'renameN' across
> multiple reboots. Is the understanding correct ?

Yes, this is correct.

--- Additional comment from Narendra K on 2013-01-23 07:59:20 EST ---

(In reply to comment #6)
> > Kay Sievers quoted/wrote:
 
> There is no user config. It is a fresh installed Fedora 17. In the most
> cases after a boot I have following interfaces:
> 
> | em1
> | lo
> | p2p1
> | p2p2
> | p2p3
> | p2p4
> | p4p1

Looking at the attached 'dmidecode' output the above names seem to be correct.
Biosdevname depends on BIOS provided SMBIOS type 41 records to name onboard interfaces and type 9 records to name add-in interfaces.In the absence of type 9 records, biosdevname uses the 'slot #' from the PCI 'SltCap' structure of the parent device.

The issue description states that the system has two onboard network interfaces. But the 'dmidecode' output shows that the system has only one 'type 41' record. So biosdevname has named only one interface as 'em1'.

The 'dmidecode' shows that there are no type 9 records in the system for the add-in network adapters. From the attached 'lspci -xxxvvv' output, observe the Slot #2 and Slot #4 (being used to name p2p1 and p4p1) 

00:01.1 PCI bridge: Intel Corporation Ivy Bridge PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
[...]

SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
	 ------------>	Slot #2, PowerLimit 75.000W; Interlock- NoCompl+


00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5) (prog-if 00 [Normal decode])
[...]
SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
	------------>		Slot #4, PowerLimit 10.000W; Interlock- NoCompl+


> 
> After other reboots I have following interfaces (example):
> 
> | em1
> | lo
> | p2p1
> | p2p3
> | p2p4
> | p4p1
> | rename2

It would be helpful to know 

1. Is the issue seen if 'biosdevname' version '0.4.0' is used ? It is available download at the following link -
http://linux.dell.com/biosdevname/biosdevname-0.4.0/biosdevname-0.4.0.tar.gz

a) Please uninstall 'biosdevname-0.4.1' from Fedora 17 (yum remove biosdevname)
b) delete /etc/udev/rules.d/70-persistent-net.rules' file if any
c) tar zxvf biosdevname-0.4.0
   cd biosdevname-0.4.0
   ./configure
    make && make install
It is required to install "pciutils-devel" and "zlib-devel" for compilation to succeed.


2. I observe that the issue description mentions ethN names on Fedora 16. Fedora 16 also has biosdevname. Was biosdevname=0 passed in Fedora 16 to get ethN names ? If yes, it would be great if you could pass 'biosdevname=0' to Fedora 17 and verify if the issue is seen with ethN names also. This would eliminate/confirm biosdevname from the scenario. ( On a fresh install of Fedora 17 it is enough to pass 'biosdevname=0' to get eth names. But on an alredy installed system with 'em' names, all the corresponding ifcfg-* files need to altered to suit the 'eth' names and any existing '70-persistent-net.rules' needs to be removed)

--- Additional comment from Rudolf E. Steiner on 2013-01-23 10:55:18 EST ---

Narendra K wrote:

> 1. Is the issue seen if 'biosdevname' version '0.4.0' is used ?

No.

I have installed "biosdevname" version "0.4.0". The issue is not seen after ten reboots with ten checks. The installation of "biosdevname" version "0.4.0" was the solution for the problem on Fedora 17.

The names of the interfaces have changed with using the "old" "biosdevname":

| em1
| lo
| p2p4
| p2p5
| p2p6
| p2p7
| p4p1

--- Additional comment from Narendra K on 2013-01-24 02:10:10 EST ---

(In reply to comment #21)
> Narendra K wrote:
> 
> > 1. Is the issue seen if 'biosdevname' version '0.4.0' is used ?
> 
> No.
> 
> I have installed "biosdevname" version "0.4.0". The issue is not seen after
> ten reboots with ten checks. The installation of "biosdevname" version
> "0.4.0" was the solution for the problem on Fedora 17.
>

Thanks. Could you please share the findings from trying Point 2 from comment #20 ?

--- Additional comment from jordan hargrave on 2013-01-24 14:00:04 EST ---

Created attachment 686918 [details]
Biosdevname 0.5.0

Can you try the newest version of biosdevname?

--- Additional comment from Rudolf E. Steiner on 2013-01-25 03:15:39 EST ---

Narendra K. wrote:

> Thanks. Could you please share the findings from trying Point 2 from comment
> #20 ?

Where I have to write "biosdevname=0" to test this?

--- Additional comment from Narendra K on 2013-01-25 04:58:35 EST ---

(In reply to comment #24)
> Narendra K. wrote:
> 
> > Thanks. Could you please share the findings from trying Point 2 from comment
> > #20 ?
> 
> Where I have to write "biosdevname=0" to test this?

It needs to be added as a kernel command line parameter in the GRUB. (Please ensure that all the relavant ifcfg-em* and ifcfg-p* files are modified to suit the ethN naming).

For a fresh install, it needs to be passed to the installer (like any other parameter).

--- Additional comment from Rudolf E. Steiner on 2013-01-25 09:25:32 EST ---

Jordan Hargrave wrote:

> Biosdevname 0.5.0
> Can you try the newest version of biosdevname?

Done. Also "biosdevname" version "0.5.0" seems to have the same problem.

The interface-names after three reboots:

| em1
| lo
| p2p1
| p2p2
| p2p3
| p4p1
| rename7

I have changed back to "biosdevname" version "0.4.0" and have made 10 reboots and 10 checks again. The problem no more exists.

--- Additional comment from Rudolf E. Steiner on 2013-01-26 10:52:12 EST ---

(In reply to comment #20)
[...]
> 2. I observe that the issue description mentions ethN names on Fedora 16.
> Fedora 16 also has biosdevname. Was biosdevname=0 passed in Fedora 16 to get
> ethN names ? If yes, it would be great if you could pass 'biosdevname=0' to
> Fedora 17 and verify if the issue is seen with ethN names also. This would
> eliminate/confirm biosdevname from the scenario. ( On a fresh install of
> Fedora 17 it is enough to pass 'biosdevname=0' to get eth names. But on an
> alredy installed system with 'em' names, all the corresponding ifcfg-* files
> need to altered to suit the 'eth' names and any existing
> '70-persistent-net.rules' needs to be removed)

I have added "biosdevname=0" as kernel-parameter.

After ten reboots and ten checks the interface-naming was always:

| eth0
| eth1
| eth2
| eth3
| eth4
| eth5
| lo

Now I using "biosdevname" in version "0.4.0" again without problems.

--- Additional comment from Felix Kaechele on 2013-02-04 17:33:50 EST ---

Seeing the same problem with Fedora 18.

For me it messes up if the machine is rebooted but not if it is shutdown and booted again.

I'm using a twin-port Intel card which uses the e1000e driver.

--- Additional comment from Fedora End Of Life on 2013-02-13 14:27:20 EST ---

Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 1 Felix Kaechele 2013-02-14 22:03:04 UTC
What is going on here? If you still have this problem you're better off with re-opening the original bug, setting the version to 18 and marking this as a duplicate.

Comment 2 Rudolf E. Steiner 2013-02-14 22:09:21 UTC
I can not open the original bug (it's not mine).

Comment 3 Rudolf E. Steiner 2013-02-14 22:14:28 UTC

*** This bug has been marked as a duplicate of bug 782145 ***