Bug 691122 - ixgbe driver failed Intel Corporation 82598EB 10-Gigabit AT CX4 network card
Summary: ixgbe driver failed Intel Corporation 82598EB 10-Gigabit AT CX4 network card
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 14
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Andy Gospodarek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-26 21:22 UTC by thbe
Modified: 2014-06-29 23:03 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-30 20:04:32 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
lshw dump (37.58 KB, text/plain)
2011-03-26 21:22 UTC, thbe
no flags Details

Description thbe 2011-03-26 21:22:36 UTC
Created attachment 487960 [details]
lshw dump

Description of problem:

When the driver is loaded, it returns a -15 hardware error (the card is working on Ubuntu 10.04.2 LTS without problems, so no hardware defect):

[root@storage2 ~]# lspci -nn | grep 82598EB
0c:00.0 Ethernet controller [0200]: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection [8086:10ec] (rev 01)
0c:00.1 Ethernet controller [0200]: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection [8086:10ec] (rev 01)
[root@storage2 ~]# dmesg | grep ixgbe
[    7.387684] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 2.0.62-k2
[    7.387687] ixgbe: Copyright (c) 1999-2010 Intel Corporation.
[    7.387733] ixgbe 0000:0c:00.0: enabling device (0000 -> 0002)
[    7.387743] ixgbe 0000:0c:00.0: PCI->APIC IRQ transform: INT B -> IRQ 17
[    7.387755] ixgbe 0000:0c:00.0: setting latency timer to 64
[    8.252045] ixgbe 0000:0c:00.0: HW Init failed: -15
[    8.252083] ixgbe: probe of 0000:0c:00.0 failed with error -15
[    8.252105] ixgbe 0000:0c:00.1: enabling device (0000 -> 0002)
[    8.252114] ixgbe 0000:0c:00.1: PCI->APIC IRQ transform: INT A -> IRQ 16
[    8.252126] ixgbe 0000:0c:00.1: setting latency timer to 64
[    9.108043] ixgbe 0000:0c:00.1: HW Init failed: -15
[    9.108060] ixgbe: probe of 0000:0c:00.1 failed with error -15
[root@storage2 ~]# uname -a
Linux storage2.ham.cimt.de 2.6.35.11-83.fc14.x86_64 #1 SMP Mon Feb 7 07:06:44 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@storage2 ~]#

Version-Release number of selected component (if applicable):

See attached dump files.

How reproducible:

Reboot

Steps to Reproduce:
1.
2.
3.
  
Actual results:

It's not working.

Expected results:

Should work :)

Additional info:

Comment 1 thbe 2011-03-28 12:42:44 UTC
I just did two additional checks, first, Fedora 14 with Gnome Desktop to make sure that it is not a dependency problem (problem discovered with a minimal install) but problem still exist.

The second check was Ubuntu 10.10 server which has the same ixgbe version as Fedora 14 has (2.0.62-k2), the card was detected and is functional:

root@storage2:~# dmesg | grep -Ei "0000:0c:00.0|0000:0c:00.1|ixgbe"
[    1.470467] pci 0000:0c:00.0: reg 10: [mem 0xd8880000-0xd889ffff]
[    1.470474] pci 0000:0c:00.0: reg 14: [mem 0xd8800000-0xd883ffff]
[    1.470481] pci 0000:0c:00.0: reg 18: [io  0x4000-0x401f]
[    1.470488] pci 0000:0c:00.0: reg 1c: [mem 0xd88c0000-0xd88c3fff]
[    1.470538] pci 0000:0c:00.0: PME# supported from D0 D3hot
[    1.470543] pci 0000:0c:00.0: PME# disabled
[    1.470584] pci 0000:0c:00.1: reg 10: [mem 0xd88a0000-0xd88bffff]
[    1.470591] pci 0000:0c:00.1: reg 14: [mem 0xd8840000-0xd887ffff]
[    1.470598] pci 0000:0c:00.1: reg 18: [io  0x4020-0x403f]
[    1.470605] pci 0000:0c:00.1: reg 1c: [mem 0xd88c4000-0xd88c7fff]
[    1.470656] pci 0000:0c:00.1: PME# supported from D0 D3hot
[    1.470660] pci 0000:0c:00.1: PME# disabled
[    3.570174] pci 0000:0c:00.0: Disabling L0s
[    3.570178] pci 0000:0c:00.1: Disabling L0s
[    4.007122] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 2.0.62-k2
[    4.007124] ixgbe: Copyright (c) 1999-2010 Intel Corporation.
[    4.007152] ixgbe 0000:0c:00.0: PCI->APIC IRQ transform: INT B -> IRQ 17
[    4.007160] ixgbe 0000:0c:00.0: setting latency timer to 64
[    4.127580] ixgbe 0000:0c:00.0: irq 70 for MSI/MSI-X
[    4.127586] ixgbe 0000:0c:00.0: irq 71 for MSI/MSI-X
[    4.127592] ixgbe 0000:0c:00.0: irq 72 for MSI/MSI-X
[    4.127598] ixgbe 0000:0c:00.0: irq 73 for MSI/MSI-X
[    4.127603] ixgbe 0000:0c:00.0: irq 74 for MSI/MSI-X
[    4.127609] ixgbe 0000:0c:00.0: irq 75 for MSI/MSI-X
[    4.127614] ixgbe 0000:0c:00.0: irq 76 for MSI/MSI-X
[    4.127620] ixgbe 0000:0c:00.0: irq 77 for MSI/MSI-X
[    4.127625] ixgbe 0000:0c:00.0: irq 78 for MSI/MSI-X
[    4.127656] ixgbe: 0000:0c:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
[    4.127663] ixgbe 0000:0c:00.0: (PCI Express:2.5Gb/s:Width x4) 00:1b:21:3b:6c:19
[    4.127744] ixgbe 0000:0c:00.0: MAC: 1, PHY: 0, PBA No: e37623-003
[    4.127746] ixgbe 0000:0c:00.0: PCI-Express bandwidth available for this card is not sufficient for optimal performance.
[    4.127748] ixgbe 0000:0c:00.0: For optimal performance a x8 PCI-Express slot is required.
[    4.141814] ixgbe 0000:0c:00.0: Intel(R) 10 Gigabit Network Connection
[    4.141839] ixgbe 0000:0c:00.1: PCI->APIC IRQ transform: INT A -> IRQ 16
[    4.141851] ixgbe 0000:0c:00.1: setting latency timer to 64
[    4.287522] ixgbe 0000:0c:00.1: irq 79 for MSI/MSI-X
[    4.287528] ixgbe 0000:0c:00.1: irq 80 for MSI/MSI-X
[    4.287533] ixgbe 0000:0c:00.1: irq 81 for MSI/MSI-X
[    4.287539] ixgbe 0000:0c:00.1: irq 82 for MSI/MSI-X
[    4.287545] ixgbe 0000:0c:00.1: irq 83 for MSI/MSI-X
[    4.287551] ixgbe 0000:0c:00.1: irq 84 for MSI/MSI-X
[    4.287557] ixgbe 0000:0c:00.1: irq 85 for MSI/MSI-X
[    4.287563] ixgbe 0000:0c:00.1: irq 86 for MSI/MSI-X
[    4.287568] ixgbe 0000:0c:00.1: irq 87 for MSI/MSI-X
[    4.287592] ixgbe: 0000:0c:00.1: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
[    4.287598] ixgbe 0000:0c:00.1: (PCI Express:2.5Gb/s:Width x4) 00:1b:21:3b:6c:18
[    4.287679] ixgbe 0000:0c:00.1: MAC: 1, PHY: 0, PBA No: e37623-003
[    4.287681] ixgbe 0000:0c:00.1: PCI-Express bandwidth available for this card is not sufficient for optimal performance.
[    4.287683] ixgbe 0000:0c:00.1: For optimal performance a x8 PCI-Express slot is required.
[    4.301760] ixgbe 0000:0c:00.1: Intel(R) 10 Gigabit Network Connection
[    6.302608] ixgbe: eth2 NIC Link is Up 10 Gbps, Flow Control: RX/TX
root@storage2:~# ping -c 4 www.redhat.com
PING e86.b.akamaiedge.net (95.100.144.112) 56(84) bytes of data.
64 bytes from 95.100.144.112: icmp_req=1 ttl=56 time=11.6 ms
64 bytes from 95.100.144.112: icmp_req=2 ttl=56 time=11.4 ms
64 bytes from 95.100.144.112: icmp_req=3 ttl=56 time=12.1 ms
64 bytes from 95.100.144.112: icmp_req=4 ttl=56 time=11.4 ms

--- e86.b.akamaiedge.net ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 11.436/11.661/12.118/0.291 ms
root@storage2:~#

Comment 2 Andy Gospodarek 2011-03-28 18:54:41 UTC
I suspect this has more to do with configuration options than anything else, but I'll check it out.

Comment 3 Andy Gospodarek 2011-03-28 19:46:20 UTC
So the kernel configs are extremely similar and the code is identical between Ubuntu 10.10 and Fedora 14's current (2.6.35.11-87) kernel.

Were you booting these distros on the same boxes and are they both using 32-bit kernels?

Comment 4 Andy Gospodarek 2011-03-28 19:49:25 UTC
Thomas, I also recall from our email correspondence that the problem with the 82598EB was something that happened frequently, but not every time.  Is this correct?

If the failure on Fedora was frequent but unpredictable you boot Ubuntu 10.10 more than once or see a similar frequency of failures?

Comment 5 thbe 2011-03-28 20:56:12 UTC
Hi Andy, i installed everything on the same box, I also tried F15 Alpha today but I didn't get the Live CD up and running so no results from this test yet. The installation was a 64Bit installation for both variants. With Ubuntu I was able to do several reboots without any problems (I used Ubuntu for the last one and a half year on this box but I'm on the way to replace all remaining Ubuntu boxes with RHEL/Fedora). With Fedora I was able to see the card after I made changes to /etc/udev/rules.d/70-persistent-net.rules (I've added a dummy entry for a non existing ethernet device) but I was not able to bring the device up and running, when I start the device I get the same -15 error.

Comment 6 Andy Gospodarek 2011-06-21 18:45:47 UTC
I've started looking at this again and I'm totally stumped as to why you would see this on Fedora (even F15) and not Ubuntu LTS.

Can you tell me what kernel command line options you are using on Ubuntu?  Even if these are the default it would be nice to know.  It seems like there must be a fundamental difference between the two systems and I wonder if the kernel command-line options are different and make a difference.

Comment 7 thbe 2011-06-21 18:58:05 UTC
Here is the kernel version and grub config (and still no problems on this box, even after reboots):

[root@storage2 grub]# uname -a
Linux storage2 2.6.38-8-server #41~lucid1-Ubuntu SMP Tue Apr 5 21:34:05 UTC 2011 x86_64 GNU/Linux
[root@storage2 grub]# cat grub.cfg 
#
# DO NOT EDIT THIS FILE
#
# It is automatically generated by grub-mkconfig using templates
# from /etc/grub.d and settings from /etc/default/grub
#

### BEGIN /etc/grub.d/00_header ###
if [ -s $prefix/grubenv ]; then
  set have_grubenv=true
  load_env
fi
set default="0"
if [ "${prev_saved_entry}" ]; then
  set saved_entry="${prev_saved_entry}"
  save_env saved_entry
  set prev_saved_entry=
  save_env prev_saved_entry
  set boot_once=true
fi

function savedefault {
  if [ -z "${boot_once}" ]; then
    saved_entry="${chosen}"
    save_env saved_entry
  fi
}

function recordfail {
  set recordfail=1
  if [ -n "${have_grubenv}" ]; then if [ -z "${boot_once}" ]; then save_env recordfail; fi; fi
}

function load_video {
  insmod vbe
  insmod vga
}

insmod lvm
insmod part_msdos
insmod ext2
set root='(storage2-root)'
search --no-floppy --fs-uuid --set ca10e991-0950-46ee-8c2f-65d2d3840215
if loadfont /usr/share/grub/unicode.pf2 ; then
  set gfxmode=640x480
  load_video
  insmod gfxterm
fi
terminal_output gfxterm
insmod part_msdos
insmod ext2
set root='(hd0,msdos1)'
search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f
set locale_dir=($root)/grub/locale
set lang=de
insmod gettext
if [ "${recordfail}" = 1 ]; then
  set timeout=-1
else
  set timeout=10
fi
### END /etc/grub.d/00_header ###

### BEGIN /etc/grub.d/05_debian_theme ###
set menu_color_normal=white/black
set menu_color_highlight=black/light-gray
### END /etc/grub.d/05_debian_theme ###

### BEGIN /etc/grub.d/10_linux ###
menuentry 'Ubuntu, with Linux 2.6.38-8-server' --class ubuntu --class gnu-linux --class gnu --class os {
	recordfail
	insmod part_msdos
	insmod ext2
	set root='(hd0,msdos1)'
	search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f
	linux	/vmlinuz-2.6.38-8-server root=/dev/mapper/storage2-root ro   quiet
	initrd	/initrd.img-2.6.38-8-server
}
menuentry 'Ubuntu, with Linux 2.6.38-8-server (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os {
	recordfail
	insmod part_msdos
	insmod ext2
	set root='(hd0,msdos1)'
	search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f
	echo	'Loading Linux 2.6.38-8-server ...'
	linux	/vmlinuz-2.6.38-8-server root=/dev/mapper/storage2-root ro single 
	echo	'Loading initial ramdisk ...'
	initrd	/initrd.img-2.6.38-8-server
}
menuentry 'Ubuntu, with Linux 2.6.35-28-server' --class ubuntu --class gnu-linux --class gnu --class os {
	recordfail
	insmod part_msdos
	insmod ext2
	set root='(hd0,msdos1)'
	search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f
	linux	/vmlinuz-2.6.35-28-server root=/dev/mapper/storage2-root ro   quiet
	initrd	/initrd.img-2.6.35-28-server
}
menuentry 'Ubuntu, with Linux 2.6.35-28-server (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os {
	recordfail
	insmod part_msdos
	insmod ext2
	set root='(hd0,msdos1)'
	search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f
	echo	'Loading Linux 2.6.35-28-server ...'
	linux	/vmlinuz-2.6.35-28-server root=/dev/mapper/storage2-root ro single 
	echo	'Loading initial ramdisk ...'
	initrd	/initrd.img-2.6.35-28-server
}
menuentry 'Ubuntu, with Linux 2.6.32-30-server' --class ubuntu --class gnu-linux --class gnu --class os {
	recordfail
	insmod part_msdos
	insmod ext2
	set root='(hd0,msdos1)'
	search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f
	linux	/vmlinuz-2.6.32-30-server root=/dev/mapper/storage2-root ro   quiet
	initrd	/initrd.img-2.6.32-30-server
}
menuentry 'Ubuntu, with Linux 2.6.32-30-server (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os {
	recordfail
	insmod part_msdos
	insmod ext2
	set root='(hd0,msdos1)'
	search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f
	echo	'Loading Linux 2.6.32-30-server ...'
	linux	/vmlinuz-2.6.32-30-server root=/dev/mapper/storage2-root ro single 
	echo	'Loading initial ramdisk ...'
	initrd	/initrd.img-2.6.32-30-server
}
### END /etc/grub.d/10_linux ###

### BEGIN /etc/grub.d/20_linux_xen ###
### END /etc/grub.d/20_linux_xen ###

### BEGIN /etc/grub.d/20_memtest86+ ###
menuentry "Memory test (memtest86+)" {
	insmod part_msdos
	insmod ext2
	set root='(hd0,msdos1)'
	search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f
	linux16	/memtest86+.bin
}
menuentry "Memory test (memtest86+, serial console 115200)" {
	insmod part_msdos
	insmod ext2
	set root='(hd0,msdos1)'
	search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f
	linux16	/memtest86+.bin console=ttyS0,115200n8
}
### END /etc/grub.d/20_memtest86+ ###

### BEGIN /etc/grub.d/30_os-prober ###
if [ "x${timeout}" != "x-1" ]; then
  if keystatus; then
    if keystatus --shift; then
      set timeout=-1
    else
      set timeout=0
    fi
  else
    if sleep --interruptible 3 ; then
      set timeout=0
    fi
  fi
fi
### END /etc/grub.d/30_os-prober ###

### BEGIN /etc/grub.d/40_custom ###
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.
### END /etc/grub.d/40_custom ###

### BEGIN /etc/grub.d/41_custom ###
if [ -f  $prefix/custom.cfg ]; then
  source $prefix/custom.cfg;
fi
### END /etc/grub.d/41_custom ###
[root@storage2 grub]#

Comment 8 Neil Horman 2011-06-21 19:03:12 UTC
Sorry for butting in, but I ran saw this problem while looking for something else.  It looks like the error is IXGBE_ERR_RESET_FAILED, which gets returned from ixgbe_reset_hw_82598 if the adapter doesn't reset properly.  Properly, in that code is defined as not happening within 10 iterations of a loop that does a udelay(1), so effectively, within 10 microseconds.  I don't have an ubuntu source tree handy, but I wouldn't be at all suprised if canonical didn't just extend that loop by a few iterations, or make the timeout just a touch longer.
HTH

Comment 9 Andy Gospodarek 2011-06-21 19:28:01 UTC
What you have described appears to be exactly the problem, Neil.

What is so interesting is there are no differences between those two functions in the source trees (I checked in the past and I just checked again today).

Comment 10 Andy Gospodarek 2011-06-21 19:33:13 UTC
Thomas, thanks for sending that kernel command line information.

Can you add pcie_aspm=off to the kernel command line for your Fedora installation?  Ubuntu does not disable ASPM and there have been some known ASPM issues with some systems that produce odd results.

Comment 11 Andy Gospodarek 2011-06-21 19:34:01 UTC
(In reply to comment #10)
> Thomas, thanks for sending that kernel command line information.
> 
> Can you add pcie_aspm=off to the kernel command line for your Fedora
> installation?  Ubuntu does not disable ASPM and there have been some known ASPM
> issues with some systems that produce odd results.

Sorry, the last sentence should read:

"Ubuntu does not enable ASPM and there have been some known ASPM issues with some systems that produce odd results."

Comment 12 thbe 2011-06-21 19:42:01 UTC
Hi Andy,

unfortunately I don't have the box available at the moment, but I will have an equal box available for testing guess within the next two months. So as soon as I have the box ready I'll do the test with ASPM off to see if it make any difference. Results will be posted as a comment to this bug, so you should get a notification when I'm done with it.

Comment 13 Andy Gospodarek 2011-06-21 19:50:16 UTC
Sounds good, Thomas.  I'm happy to help you out when the system is available.

I saw this your lshw output:

    description: Computer
    product: X7DB8 ()
    vendor: Supermicro
    version: 0123456789
    serial: 0123456789
    width: 64 bits
    capabilities: smbios-2.5 dmi-2.5 vsyscall64 vsyscall32

Does that mean this was a 'whitebox' with Supermicro motherboard or was this a system that your purchased from a system vendor.

I ask because if it was a system you purchased rather than built we might have one here I can try.

Comment 14 thbe 2011-06-21 20:05:12 UTC
The system is a storage box build by Thomas Krenn (http://www.thomas-krenn.com/de/storage-loesungen/storage-systeme/thomas-krenn-storage/3he-intel-dual-cpu-sc836-storage.html), don't know if the components are still the same because the box I have is from 2008 or so. The 10Gb card was not shipped with the box, we've attached it later on to the storage box.

Comment 15 Andy Gospodarek 2011-06-21 20:20:26 UTC
OK, thanks.  We definitely don't have one of those.  :-)

Comment 16 Andy Gospodarek 2011-06-23 01:35:45 UTC
I wonder if the inability to properly bring up the device has anything to do with a pci quirk that Ubuntu is carrying, but we are not.  It appears that the 82598 is behind this bridge chip:

        *-pci:3
             description: PCI bridge
             product: 631xESB/632xESB/3100 Chipset PCI Express Root Port 1
             vendor: Intel Corporation
             physical id: 1c
             bus info: pci@0000:00:1c.0
             version: 09
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:67 ioport:4000(size=4096) memory:d8800000-d88fffff ioport:d8d00000(size=2097152)

which looks like is has the pci id 8086:2690.

Comment 17 Andy Gospodarek 2011-06-23 01:45:43 UTC
Looks like my assertion above was not the case, the only quirk being carried by Ubuntu that does not appear in F15 kernel is this:

#if defined(CONFIG_DMAR) || defined(CONFIG_INTR_REMAP)
#define VTUNCERRMSK_REG	0x1ac
#define VTD_MSK_SPEC_ERRORS	(1 << 31)
/*
 * This is a quirk for masking vt-d spec defined errors to platform error
 * handling logic. With out this, platforms using Intel 7500, 5500 chipsets
 * (and the derivative chipsets like X58 etc) seem to generate NMI/SMI (based
 * on the RAS config settings of the platform) when a vt-d fault happens.
 * The resulting SMI caused the system to hang.
 *
 * VT-d spec related errors are already handled by the VT-d OS code, so no
 * need to report the same error through other channels.
 */
static void vtd_mask_spec_errors(struct pci_dev *dev)
{
	u32 word;

	pci_read_config_dword(dev, VTUNCERRMSK_REG, &word);
	pci_write_config_dword(dev, VTUNCERRMSK_REG, word | VTD_MSK_SPEC_ERRORS);
}
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x342e, vtd_mask_spec_errors);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x3c28, vtd_mask_spec_errors);
#endif

Comment 18 Andy Gospodarek 2011-08-30 20:04:32 UTC
I have a feeling this will be resolved if you boot the system with ASPM disabled on the kernel command-line.

I'm going to go ahead and close this bug, but please reopen if booting with 'pcie_aspm=off' does not allow you to bring up your adapter properly in Fedora.


Note You need to log in before you can comment on or make changes to this bug.