Created attachment 596113 [details] lspci output Description of problem: We have servers with 4 to 6 4-port 10 gig ethernet NICs. Fedora 17 does not properly categorize such into the p#p# NIC naming convention. Instead we get random eth# assignments for all but the embedded NICs... those get the p#p#, not the em# nomenclature. Is this a bug? We have other servers with 2 port 10 gig ethernet NICs where this works perfectly. Attached is the lspci output from one of the problem servers. Version-Release number of selected component (if applicable): F17 x86_64 kernel-3.4.3-1.fc17.x86_64 udev-182-3.fc17.x86_64
Created attachment 596114 [details] lspci -v output
The multiport adapters we have are Hotlava Tambora 64G4 and 80G4 cards. See http://www.hotlavasystems.com/products_10gbe.html for more details.
do you have biosdevname installed? If no, install biosdevname. What is the output of: # biosdevname -d
Created attachment 596455 [details] output from biosdevname -d
It is installed, apparently by default... here is the output
very strange: In Fedora 15 x86_64, everything is much happier. See attached
Created attachment 596533 [details] Fedora 15 output from biosdevname -d, same machine
Worth noting here that even in Fedora 15, things are off. There are two built-in ethernets on the motherboard, yet we have em1 through em6 listed, and the one I was connected to wasn't showing itself as one of them. See http://www.supermicro.com/products/motherboard/Xeon/C600/X9DRi-F.cfm for more motherboard details.
Guys, this is an important bug. We are experiencing randomness in Hotlava Tambora port names. Please help!
Hi, please attach the output from the following - 1.dmidecode 2.biosdecode 3.lspci -tv 4.lspci -vvvxxxx
From the attached 'biosdevname -d' output, it seems like multiple the SMBIOS type 9 records are incorrect. The information requested in comment #10 will help us understand better. Observe that biosdevname -d output is indicating Duplicate: True <---------------Observe this. BIOS device: p116p1 Kernel name: eth16 Permanent MAC: 00:12:C0:80:27:36 Assigned MAC : 00:12:C0:80:27:36 Driver: ixgbe Driver version: 3.8.21-k Firmware version: 0x2b2c0001 Bus Info: 0000:94:00.0 PCI name : 0000:94:00.0 PCI Slot : 116 <---------------Observe this. Index in slot: 1 Duplicate: True <---------------Observe this. BIOS device: p116p2 Kernel name: eth17 Permanent MAC: 00:12:C0:80:27:37 Assigned MAC : 00:12:C0:80:27:37 Driver: ixgbe Driver version: 3.8.21-k Firmware version: 0x2b2c0001 Bus Info: 0000:94:00.1 PCI name : 0000:94:00.1 PCI Slot : 116 <---------------Observe this. Index in slot: 2 Only the first two entries seem to have correct values - BIOS device: p1p1 Kernel name: p1p1 Permanent MAC: 00:25:90:6B:0A:E0 Assigned MAC : 00:25:90:6B:0A:E0 Driver: igb Driver version: 3.2.10-k Firmware version: 1.5-2 Bus Info: 0000:02:00.0 PCI name : 0000:02:00.0 PCI Slot : 1 <----------- Observe this SMBIOS Label: CPU1_SLOT1 <----------- Observe this. Index in slot: 1 BIOS device: p1p2 Kernel name: p1p2 Permanent MAC: 00:25:90:6B:0A:E1 Assigned MAC : 00:25:90:6B:0A:E1 Driver: igb Driver version: 3.2.10-k Firmware version: 1.5-2 Bus Info: 0000:02:00.1 PCI name : 0000:02:00.1 PCI Slot : 1 <----------- Observe this SMBIOS Label: CPU1_SLOT1 <----------- Observe this Index in slot: 2
Created attachment 611851 [details] biosdecode
Created attachment 611852 [details] dmidecode
Created attachment 611853 [details] lspci -tv
Created attachment 611854 [details] lspci -vvvxxxx When running this command, I see sterr that says the following: pcilib: sysfs_read_vpd: read failed: Connection timed out pcilib: sysfs_read_vpd: read failed: Connection timed out lpcilib: sysfs_read_vpd: read failed: Connection timed out pcilib: sysfs_read_vpd: read failed: Connection timed out
From the attached 'dmidecode' output the type 41 record for onboard LOM seems to be wrong. type 41 records are used to name onboard devices. Handle 0x0081, DMI type 41, 11 bytes Onboard Device Reference Designation: Onboard LAN Type: Ethernet Status: Enabled Type Instance: 1 Bus Address: 0000:00:19.0 <-----------1 The PCI address 0000:00:19.0 is not present in the lspci output. So the PCI address is incorrect. This is likey a BIOS issue. Also, the type 9 records are incorrect. They have wrong PCI bus:device.function values populated. Except the first record with 'Bus Address: 0000:02:00.0' rest of the type 9 records are incorrect. Type 9 records are used to name PCI add-in adapters placed on slots and when type 9 records are not available, biosdevname falls back of the PCIE slot capability. The slot numbers 116 and 117 are coming PCIE slot capability structure of the PCI bridges under which the network interfaces are present. Refer to the attached lspci -vvvxxx structure. Based on the attached logs the incorrect names are resulting from incorrect SMBIOS type 9 records and slot numbers provided in the PCI slot capability structure. This seems to be a system BIOS issue. Also, when biosdevname finds that more than one interface will get the same name, it will not name any interface. *0e:04.0 PCI bridge: PLX Technology, Inc. PEX 8648 48-lane, 12-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev bb) (prog-if 00 [Normal decode]) [...] SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #116, PowerLimit 25.000W; Interlock- NoCompl- *0e:05.0 PCI bridge: PLX Technology, Inc. PEX 8648 48-lane, 12-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev bb) (prog-if 00 [Normal decode]) [...] SltCap: AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise- Slot #117, PowerLimit 25.000W; Interlock- NoCompl- Handle 0x0024, DMI type 9, 17 bytes System Slot Information Designation: CPU1_SLOT1 Type: x8 PCI Express Current Usage: In Use Length: Short ID: 1 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0000:02:00.0 Handle 0x0025, DMI type 9, 17 bytes System Slot Information Designation: CPU1_SLOT2 Type: x16 PCI Express Current Usage: In Use Length: Short ID: 2 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0000:0c:00.0 Handle 0x0026, DMI type 9, 17 bytes System Slot Information Designation: CPU1_SLOT3 Type: x8 PCI Express Current Usage: In Use Length: Short ID: 3 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0000:04:00.2 Handle 0x0027, DMI type 9, 17 bytes System Slot Information Designation: CPU2_SLOT4 Type: x16 PCI Express Current Usage: Available Length: Short ID: 4 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0080:ff:00.0 Handle 0x0028, DMI type 9, 17 bytes System Slot Information Designation: CPU2_SLOT5 Type: x8 PCI Express Current Usage: In Use Length: Long ID: 5 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0080:0d:00.0 Handle 0x0029, DMI type 9, 17 bytes System Slot Information Designation: CPU2_SLOT6 Type: x16 PCI Express Current Usage: Available Length: Long ID: 6 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Bus Address: 0080:ff:00.0
I'm working with the BIOS manufacturer, Supermicro, to see if they can't fix this issue in BIOS. Will keep you posted.
Installed a new BIOS from SuperMicro that they swear solves this issue... no change. Installed Fedora 18 Beta x86_64... no change. Attached here is the "biosdevname -d" output from Fedora 18 Beta.
Created attachment 653117 [details] Fedora 18 Beta x86_64 output of biosdevname -d
Just FYI, Ubuntu 12.10 shows the same thing
(In reply to comment #18) > Installed a new BIOS from SuperMicro that they swear solves this issue... no > change. Installed Fedora 18 Beta x86_64... no change. Attached here is the > "biosdevname -d" output from Fedora 18 Beta. Please attach the output from 'dmidecode', 'biosdecode', lspci -vvvxxx and lspci -tv after the new BIOS is flashed.
I think we've found the root cause. We are using Hotlava Tambora 64G4 and 80G4 cards, that employ a PLX 8648 switch chip on them that fan out to a pair of Intel 82599 10GbE controllers on each card. Each controller claims a phantom PCI "slot" ID, either p116 or p117. All ports on these cards then get assigned to either one or the other, which is why we are seeing p116p1-10 and p117p1-10 We are seeing this on multiple different machine types, with multiple version of Fedora and Ubuntu... so it doesn't appear to be a Linux issue at all. We are working with the Hotlava vendor to address this issue that seems to be caused by the Intel 82599 controllers each claiming their own PCI slot. Does this seem like the right approach? Anything else we should be looking at from the Linux/Fedora angle?
(In reply to comment #22) > I think we've found the root cause. We are using Hotlava Tambora 64G4 and > 80G4 cards, that employ a PLX 8648 switch chip on them that fan out to a > pair of Intel 82599 10GbE controllers on each card. > > Each controller claims a phantom PCI "slot" ID, either p116 or p117. All > ports on these cards then get assigned to either one or the other, which is > why we are seeing > > p116p1-10 > and > p117p1-10 Right. As observed in comment #16. > Based on the attached logs the incorrect names are resulting from incorrect > SMBIOS type 9 records and slot numbers provided in the PCI slot capability > structure. This seems to be a system BIOS issue. > > > *0e:04.0 PCI bridge: PLX Technology, Inc. PEX 8648 48-lane, 12-Port PCI > Express Gen 2 (5.0 GT/s) Switch (rev bb) (prog-if 00 [Normal decode]) > [...] > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- > Slot #116, PowerLimit 25.000W; Interlock- NoCompl- The slot # is mentioned as 116 in SltCap structure. > > *0e:05.0 PCI bridge: PLX Technology, Inc. PEX 8648 48-lane, 12-Port PCI > Express Gen 2 (5.0 GT/s) Switch (rev bb) (prog-if 00 [Normal decode]) > [...] > > SltCap: AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise- > Slot #117, PowerLimit 25.000W; Interlock- NoCol- > The slot # is mentioned as 117 in SltCap structure. Also, please verify if the new BIOS has SMBIOS type 9 records populated correctly. If type 9 record is not available, then biosdevname will look for the slot # from SltCap structure.
Yes, the server manufacturer BIOS, Supermicro in this case, is slightly broken.
This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.