Bug 837455 - udev failing to rename large number of NICs to p#p#
Summary: udev failing to rename large number of NICs to p#p#
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: biosdevname
Version: 17
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Narendra K
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-07-03 23:41 UTC by joshua
Modified: 2013-08-01 17:53 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-08-01 17:53:04 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
lspci output (16.46 KB, text/plain)
2012-07-03 23:41 UTC, joshua
no flags Details
lspci -v output (74.98 KB, text/plain)
2012-07-03 23:42 UTC, joshua
no flags Details
output from biosdevname -d (5.98 KB, text/plain)
2012-07-05 17:29 UTC, joshua
no flags Details
Fedora 15 output from biosdevname -d, same machine (5.94 KB, text/plain)
2012-07-05 23:07 UTC, joshua
no flags Details
biosdecode (1.53 KB, text/plain)
2012-09-11 16:04 UTC, joshua
no flags Details
dmidecode (41.96 KB, text/plain)
2012-09-11 16:04 UTC, joshua
no flags Details
lspci -tv (12.44 KB, text/plain)
2012-09-11 16:05 UTC, joshua
no flags Details
lspci -vvvxxxx (1.63 MB, text/plain)
2012-09-11 16:06 UTC, joshua
no flags Details
Fedora 18 Beta x86_64 output of biosdevname -d (5.92 KB, text/plain)
2012-11-27 21:03 UTC, joshua
no flags Details

Description joshua 2012-07-03 23:41:50 UTC
Created attachment 596113 [details]
lspci output

Description of problem:

We have servers with 4 to 6 4-port 10 gig ethernet NICs.  Fedora 17 does not properly categorize such into the p#p# NIC naming convention.  Instead we get random eth# assignments for all but the embedded NICs... those get the p#p#, not the em# nomenclature. Is this a bug?

We have other servers with 2 port 10 gig ethernet NICs where this works perfectly.

Attached is the lspci output from one of the problem servers.


Version-Release number of selected component (if applicable):

F17 x86_64
kernel-3.4.3-1.fc17.x86_64
udev-182-3.fc17.x86_64

Comment 1 joshua 2012-07-03 23:42:22 UTC
Created attachment 596114 [details]
lspci -v output

Comment 2 joshua 2012-07-03 23:49:49 UTC
The multiport adapters we have are Hotlava Tambora 64G4 and 80G4 cards.

See http://www.hotlavasystems.com/products_10gbe.html for more details.

Comment 3 Harald Hoyer 2012-07-05 14:18:51 UTC
do you have biosdevname installed? If no, install biosdevname.

What is the output of:

# biosdevname -d

Comment 4 joshua 2012-07-05 17:29:51 UTC
Created attachment 596455 [details]
output from biosdevname -d

Comment 5 joshua 2012-07-05 17:30:19 UTC
It is installed, apparently by default... here is the output

Comment 6 joshua 2012-07-05 23:06:02 UTC
very strange:  In Fedora 15 x86_64, everything is much happier.  See attached

Comment 7 joshua 2012-07-05 23:07:04 UTC
Created attachment 596533 [details]
Fedora 15 output from biosdevname -d, same machine

Comment 8 joshua 2012-07-06 22:35:03 UTC
Worth noting here that even in Fedora 15, things are off.  There are two built-in ethernets on the motherboard, yet we have em1 through em6 listed, and the one I was connected to wasn't showing itself as one of them.  See http://www.supermicro.com/products/motherboard/Xeon/C600/X9DRi-F.cfm for more motherboard details.

Comment 9 joshua 2012-09-10 19:26:16 UTC
Guys, this is an important bug.  We are experiencing randomness in Hotlava Tambora port names.

Please help!

Comment 10 Narendra K 2012-09-11 08:09:06 UTC
Hi, please attach the output from the following -

1.dmidecode
2.biosdecode
3.lspci -tv
4.lspci -vvvxxxx

Comment 11 Narendra K 2012-09-11 13:51:12 UTC
From the attached 'biosdevname -d' output, it seems like multiple the SMBIOS type 9 records are incorrect. The information requested in comment #10 will help us understand better.


Observe that biosdevname -d output is indicating 

Duplicate: True       <---------------Observe this.
BIOS device: p116p1
Kernel name: eth16
Permanent MAC: 00:12:C0:80:27:36
Assigned MAC : 00:12:C0:80:27:36
Driver: ixgbe
Driver version: 3.8.21-k
Firmware version: 0x2b2c0001
Bus Info: 0000:94:00.0
PCI name      : 0000:94:00.0
PCI Slot      : 116 <---------------Observe this.
Index in slot: 1

Duplicate: True <---------------Observe this.
BIOS device: p116p2
Kernel name: eth17
Permanent MAC: 00:12:C0:80:27:37
Assigned MAC : 00:12:C0:80:27:37
Driver: ixgbe
Driver version: 3.8.21-k
Firmware version: 0x2b2c0001
Bus Info: 0000:94:00.1
PCI name      : 0000:94:00.1
PCI Slot      : 116 <---------------Observe this.
Index in slot: 2

Only the first two entries seem to have correct values -

BIOS device: p1p1
Kernel name: p1p1
Permanent MAC: 00:25:90:6B:0A:E0
Assigned MAC : 00:25:90:6B:0A:E0
Driver: igb
Driver version: 3.2.10-k
Firmware version: 1.5-2
Bus Info: 0000:02:00.0
PCI name      : 0000:02:00.0
PCI Slot      : 1        <----------- Observe this
SMBIOS Label: CPU1_SLOT1 <----------- Observe this.
Index in slot: 1

BIOS device: p1p2
Kernel name: p1p2
Permanent MAC: 00:25:90:6B:0A:E1
Assigned MAC : 00:25:90:6B:0A:E1
Driver: igb
Driver version: 3.2.10-k
Firmware version: 1.5-2
Bus Info: 0000:02:00.1
PCI name      : 0000:02:00.1
PCI Slot      : 1        <----------- Observe this
SMBIOS Label: CPU1_SLOT1 <----------- Observe this
Index in slot: 2

Comment 12 joshua 2012-09-11 16:04:17 UTC
Created attachment 611851 [details]
biosdecode

Comment 13 joshua 2012-09-11 16:04:49 UTC
Created attachment 611852 [details]
dmidecode

Comment 14 joshua 2012-09-11 16:05:34 UTC
Created attachment 611853 [details]
lspci -tv

Comment 15 joshua 2012-09-11 16:06:54 UTC
Created attachment 611854 [details]
lspci -vvvxxxx

When running this command, I see sterr that says the following:

pcilib: sysfs_read_vpd: read failed: Connection timed out
pcilib: sysfs_read_vpd: read failed: Connection timed out
lpcilib: sysfs_read_vpd: read failed: Connection timed out
pcilib: sysfs_read_vpd: read failed: Connection timed out

Comment 16 Narendra K 2012-09-11 17:43:49 UTC
From the attached 'dmidecode' output the type 41 record for onboard LOM seems to be wrong. type 41 records are used to name onboard devices.

Handle 0x0081, DMI type 41, 11 bytes
Onboard Device
	Reference Designation:  Onboard LAN
	Type: Ethernet
	Status: Enabled
	Type Instance: 1
	Bus Address: 0000:00:19.0 <-----------1

The PCI address 0000:00:19.0 is not present in the lspci output. So the PCI address is incorrect. This is likey a BIOS issue.

Also, the type 9 records are incorrect. They have wrong PCI bus:device.function values populated. Except the first record with 'Bus Address: 0000:02:00.0' rest of the type 9 records are incorrect. Type 9 records are used to name PCI add-in adapters placed on slots and when type 9 records are not available, biosdevname falls back of the PCIE slot capability. The slot numbers 116 and 117 are coming PCIE slot capability structure of the PCI bridges under which the network interfaces are present. Refer to the attached lspci -vvvxxx structure.

Based on the attached logs the incorrect names are resulting from incorrect SMBIOS type 9 records and slot numbers provided in the PCI slot capability structure. This seems to be a system BIOS issue.

Also, when biosdevname finds that more than one interface will get the same name, it will not name any interface. 

*0e:04.0 PCI bridge: PLX Technology, Inc. PEX 8648 48-lane, 12-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev bb) (prog-if 00 [Normal decode])
[...]
SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #116, PowerLimit 25.000W; Interlock- NoCompl-

*0e:05.0 PCI bridge: PLX Technology, Inc. PEX 8648 48-lane, 12-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev bb) (prog-if 00 [Normal decode])
[...]	

		SltCap:	AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise-
			Slot #117, PowerLimit 25.000W; Interlock- NoCompl-




Handle 0x0024, DMI type 9, 17 bytes
System Slot Information
	Designation: CPU1_SLOT1
	Type: x8 PCI Express
	Current Usage: In Use
	Length: Short
	ID: 1
	Characteristics:
		3.3 V is provided
		Opening is shared
		PME signal is supported
	Bus Address: 0000:02:00.0

Handle 0x0025, DMI type 9, 17 bytes
System Slot Information
	Designation: CPU1_SLOT2
	Type: x16 PCI Express
	Current Usage: In Use
	Length: Short
	ID: 2
	Characteristics:
		3.3 V is provided
		Opening is shared
		PME signal is supported
	Bus Address: 0000:0c:00.0

Handle 0x0026, DMI type 9, 17 bytes
System Slot Information
	Designation: CPU1_SLOT3
	Type: x8 PCI Express
	Current Usage: In Use
	Length: Short
	ID: 3
	Characteristics:
		3.3 V is provided
		Opening is shared
		PME signal is supported
	Bus Address: 0000:04:00.2

Handle 0x0027, DMI type 9, 17 bytes
System Slot Information
	Designation: CPU2_SLOT4
	Type: x16 PCI Express
	Current Usage: Available
	Length: Short
	ID: 4
	Characteristics:
		3.3 V is provided
		Opening is shared
		PME signal is supported
	Bus Address: 0080:ff:00.0

Handle 0x0028, DMI type 9, 17 bytes
System Slot Information
	Designation: CPU2_SLOT5
	Type: x8 PCI Express
	Current Usage: In Use
	Length: Long
	ID: 5
	Characteristics:
		3.3 V is provided
		Opening is shared
		PME signal is supported
	Bus Address: 0080:0d:00.0

Handle 0x0029, DMI type 9, 17 bytes
System Slot Information
	Designation: CPU2_SLOT6
	Type: x16 PCI Express
	Current Usage: Available
	Length: Long
	ID: 6
	Characteristics:
		3.3 V is provided
		Opening is shared
		PME signal is supported
	Bus Address: 0080:ff:00.0

Comment 17 joshua 2012-11-05 19:45:01 UTC
I'm working with the BIOS manufacturer, Supermicro, to see if they can't fix this issue in BIOS.  Will keep you posted.

Comment 18 joshua 2012-11-27 21:02:11 UTC
Installed a new BIOS from SuperMicro that they swear solves this issue... no change.  Installed Fedora 18 Beta x86_64... no change.  Attached here is the "biosdevname -d" output from Fedora 18 Beta.

Comment 19 joshua 2012-11-27 21:03:22 UTC
Created attachment 653117 [details]
Fedora 18 Beta x86_64 output of biosdevname -d

Comment 20 joshua 2012-11-27 22:42:01 UTC
Just FYI, Ubuntu 12.10 shows the same thing

Comment 21 Narendra K 2012-11-29 18:25:44 UTC
(In reply to comment #18)
> Installed a new BIOS from SuperMicro that they swear solves this issue... no
> change.  Installed Fedora 18 Beta x86_64... no change.  Attached here is the
> "biosdevname -d" output from Fedora 18 Beta.

Please attach the output from 'dmidecode', 'biosdecode', lspci -vvvxxx and lspci -tv after the new BIOS is flashed.

Comment 22 joshua 2012-12-04 22:02:05 UTC
I think we've found the root cause.  We are using Hotlava Tambora 64G4 and 80G4 cards, that employ a PLX 8648 switch chip on them that fan out to a pair of Intel 82599 10GbE controllers on each card.

Each controller claims a phantom PCI "slot" ID, either p116 or p117.  All ports on these cards then get assigned to either one or the other, which is why we are seeing

p116p1-10
and
p117p1-10

We are seeing this on multiple different machine types, with multiple version of Fedora and Ubuntu... so it doesn't appear to be a Linux issue at all.

We are working with the Hotlava vendor to address this issue that seems to be caused by the Intel 82599 controllers each claiming their own PCI slot.


Does this seem like the right approach?  Anything else we should be looking at from the Linux/Fedora angle?

Comment 23 Narendra K 2012-12-07 09:45:39 UTC
(In reply to comment #22)
> I think we've found the root cause.  We are using Hotlava Tambora 64G4 and
> 80G4 cards, that employ a PLX 8648 switch chip on them that fan out to a
> pair of Intel 82599 10GbE controllers on each card.
> 
> Each controller claims a phantom PCI "slot" ID, either p116 or p117.  All
> ports on these cards then get assigned to either one or the other, which is
> why we are seeing
> 
> p116p1-10
> and
> p117p1-10

Right. As observed in comment #16.

> Based on the attached logs the incorrect names are resulting from incorrect
> SMBIOS type 9 records and slot numbers provided in the PCI slot capability
> structure. This seems to be a system BIOS issue.
> 
> 
> *0e:04.0 PCI bridge: PLX Technology, Inc. PEX 8648 48-lane, 12-Port PCI
> Express Gen 2 (5.0 GT/s) Switch (rev bb) (prog-if 00 [Normal decode])
> [...]
> SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
> 			Slot #116, PowerLimit 25.000W; Interlock- NoCompl-

The slot # is mentioned as 116 in SltCap structure.

> 
> *0e:05.0 PCI bridge: PLX Technology, Inc. PEX 8648 48-lane, 12-Port PCI
> Express Gen 2 (5.0 GT/s) Switch (rev bb) (prog-if 00 [Normal decode])
> [...]	
> 
> 		SltCap:	AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise-
> 			Slot #117, PowerLimit 25.000W; Interlock- NoCol-
> 

The slot # is mentioned as 117 in SltCap structure.

Also, please verify if the new BIOS has SMBIOS type 9 records populated correctly. If type 9 record is not available, then biosdevname will look for the slot # from SltCap structure.

Comment 24 joshua 2013-03-21 22:39:55 UTC
Yes, the server manufacturer BIOS, Supermicro in this case, is slightly broken.

Comment 25 Fedora End Of Life 2013-07-04 06:21:20 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 26 Fedora End Of Life 2013-08-01 17:53:09 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.