Created attachment 487960 [details] lshw dump Description of problem: When the driver is loaded, it returns a -15 hardware error (the card is working on Ubuntu 10.04.2 LTS without problems, so no hardware defect): [root@storage2 ~]# lspci -nn | grep 82598EB 0c:00.0 Ethernet controller [0200]: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection [8086:10ec] (rev 01) 0c:00.1 Ethernet controller [0200]: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection [8086:10ec] (rev 01) [root@storage2 ~]# dmesg | grep ixgbe [ 7.387684] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 2.0.62-k2 [ 7.387687] ixgbe: Copyright (c) 1999-2010 Intel Corporation. [ 7.387733] ixgbe 0000:0c:00.0: enabling device (0000 -> 0002) [ 7.387743] ixgbe 0000:0c:00.0: PCI->APIC IRQ transform: INT B -> IRQ 17 [ 7.387755] ixgbe 0000:0c:00.0: setting latency timer to 64 [ 8.252045] ixgbe 0000:0c:00.0: HW Init failed: -15 [ 8.252083] ixgbe: probe of 0000:0c:00.0 failed with error -15 [ 8.252105] ixgbe 0000:0c:00.1: enabling device (0000 -> 0002) [ 8.252114] ixgbe 0000:0c:00.1: PCI->APIC IRQ transform: INT A -> IRQ 16 [ 8.252126] ixgbe 0000:0c:00.1: setting latency timer to 64 [ 9.108043] ixgbe 0000:0c:00.1: HW Init failed: -15 [ 9.108060] ixgbe: probe of 0000:0c:00.1 failed with error -15 [root@storage2 ~]# uname -a Linux storage2.ham.cimt.de 2.6.35.11-83.fc14.x86_64 #1 SMP Mon Feb 7 07:06:44 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux [root@storage2 ~]# Version-Release number of selected component (if applicable): See attached dump files. How reproducible: Reboot Steps to Reproduce: 1. 2. 3. Actual results: It's not working. Expected results: Should work :) Additional info:
I just did two additional checks, first, Fedora 14 with Gnome Desktop to make sure that it is not a dependency problem (problem discovered with a minimal install) but problem still exist. The second check was Ubuntu 10.10 server which has the same ixgbe version as Fedora 14 has (2.0.62-k2), the card was detected and is functional: root@storage2:~# dmesg | grep -Ei "0000:0c:00.0|0000:0c:00.1|ixgbe" [ 1.470467] pci 0000:0c:00.0: reg 10: [mem 0xd8880000-0xd889ffff] [ 1.470474] pci 0000:0c:00.0: reg 14: [mem 0xd8800000-0xd883ffff] [ 1.470481] pci 0000:0c:00.0: reg 18: [io 0x4000-0x401f] [ 1.470488] pci 0000:0c:00.0: reg 1c: [mem 0xd88c0000-0xd88c3fff] [ 1.470538] pci 0000:0c:00.0: PME# supported from D0 D3hot [ 1.470543] pci 0000:0c:00.0: PME# disabled [ 1.470584] pci 0000:0c:00.1: reg 10: [mem 0xd88a0000-0xd88bffff] [ 1.470591] pci 0000:0c:00.1: reg 14: [mem 0xd8840000-0xd887ffff] [ 1.470598] pci 0000:0c:00.1: reg 18: [io 0x4020-0x403f] [ 1.470605] pci 0000:0c:00.1: reg 1c: [mem 0xd88c4000-0xd88c7fff] [ 1.470656] pci 0000:0c:00.1: PME# supported from D0 D3hot [ 1.470660] pci 0000:0c:00.1: PME# disabled [ 3.570174] pci 0000:0c:00.0: Disabling L0s [ 3.570178] pci 0000:0c:00.1: Disabling L0s [ 4.007122] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 2.0.62-k2 [ 4.007124] ixgbe: Copyright (c) 1999-2010 Intel Corporation. [ 4.007152] ixgbe 0000:0c:00.0: PCI->APIC IRQ transform: INT B -> IRQ 17 [ 4.007160] ixgbe 0000:0c:00.0: setting latency timer to 64 [ 4.127580] ixgbe 0000:0c:00.0: irq 70 for MSI/MSI-X [ 4.127586] ixgbe 0000:0c:00.0: irq 71 for MSI/MSI-X [ 4.127592] ixgbe 0000:0c:00.0: irq 72 for MSI/MSI-X [ 4.127598] ixgbe 0000:0c:00.0: irq 73 for MSI/MSI-X [ 4.127603] ixgbe 0000:0c:00.0: irq 74 for MSI/MSI-X [ 4.127609] ixgbe 0000:0c:00.0: irq 75 for MSI/MSI-X [ 4.127614] ixgbe 0000:0c:00.0: irq 76 for MSI/MSI-X [ 4.127620] ixgbe 0000:0c:00.0: irq 77 for MSI/MSI-X [ 4.127625] ixgbe 0000:0c:00.0: irq 78 for MSI/MSI-X [ 4.127656] ixgbe: 0000:0c:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 [ 4.127663] ixgbe 0000:0c:00.0: (PCI Express:2.5Gb/s:Width x4) 00:1b:21:3b:6c:19 [ 4.127744] ixgbe 0000:0c:00.0: MAC: 1, PHY: 0, PBA No: e37623-003 [ 4.127746] ixgbe 0000:0c:00.0: PCI-Express bandwidth available for this card is not sufficient for optimal performance. [ 4.127748] ixgbe 0000:0c:00.0: For optimal performance a x8 PCI-Express slot is required. [ 4.141814] ixgbe 0000:0c:00.0: Intel(R) 10 Gigabit Network Connection [ 4.141839] ixgbe 0000:0c:00.1: PCI->APIC IRQ transform: INT A -> IRQ 16 [ 4.141851] ixgbe 0000:0c:00.1: setting latency timer to 64 [ 4.287522] ixgbe 0000:0c:00.1: irq 79 for MSI/MSI-X [ 4.287528] ixgbe 0000:0c:00.1: irq 80 for MSI/MSI-X [ 4.287533] ixgbe 0000:0c:00.1: irq 81 for MSI/MSI-X [ 4.287539] ixgbe 0000:0c:00.1: irq 82 for MSI/MSI-X [ 4.287545] ixgbe 0000:0c:00.1: irq 83 for MSI/MSI-X [ 4.287551] ixgbe 0000:0c:00.1: irq 84 for MSI/MSI-X [ 4.287557] ixgbe 0000:0c:00.1: irq 85 for MSI/MSI-X [ 4.287563] ixgbe 0000:0c:00.1: irq 86 for MSI/MSI-X [ 4.287568] ixgbe 0000:0c:00.1: irq 87 for MSI/MSI-X [ 4.287592] ixgbe: 0000:0c:00.1: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 [ 4.287598] ixgbe 0000:0c:00.1: (PCI Express:2.5Gb/s:Width x4) 00:1b:21:3b:6c:18 [ 4.287679] ixgbe 0000:0c:00.1: MAC: 1, PHY: 0, PBA No: e37623-003 [ 4.287681] ixgbe 0000:0c:00.1: PCI-Express bandwidth available for this card is not sufficient for optimal performance. [ 4.287683] ixgbe 0000:0c:00.1: For optimal performance a x8 PCI-Express slot is required. [ 4.301760] ixgbe 0000:0c:00.1: Intel(R) 10 Gigabit Network Connection [ 6.302608] ixgbe: eth2 NIC Link is Up 10 Gbps, Flow Control: RX/TX root@storage2:~# ping -c 4 www.redhat.com PING e86.b.akamaiedge.net (95.100.144.112) 56(84) bytes of data. 64 bytes from 95.100.144.112: icmp_req=1 ttl=56 time=11.6 ms 64 bytes from 95.100.144.112: icmp_req=2 ttl=56 time=11.4 ms 64 bytes from 95.100.144.112: icmp_req=3 ttl=56 time=12.1 ms 64 bytes from 95.100.144.112: icmp_req=4 ttl=56 time=11.4 ms --- e86.b.akamaiedge.net ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3004ms rtt min/avg/max/mdev = 11.436/11.661/12.118/0.291 ms root@storage2:~#
I suspect this has more to do with configuration options than anything else, but I'll check it out.
So the kernel configs are extremely similar and the code is identical between Ubuntu 10.10 and Fedora 14's current (2.6.35.11-87) kernel. Were you booting these distros on the same boxes and are they both using 32-bit kernels?
Thomas, I also recall from our email correspondence that the problem with the 82598EB was something that happened frequently, but not every time. Is this correct? If the failure on Fedora was frequent but unpredictable you boot Ubuntu 10.10 more than once or see a similar frequency of failures?
Hi Andy, i installed everything on the same box, I also tried F15 Alpha today but I didn't get the Live CD up and running so no results from this test yet. The installation was a 64Bit installation for both variants. With Ubuntu I was able to do several reboots without any problems (I used Ubuntu for the last one and a half year on this box but I'm on the way to replace all remaining Ubuntu boxes with RHEL/Fedora). With Fedora I was able to see the card after I made changes to /etc/udev/rules.d/70-persistent-net.rules (I've added a dummy entry for a non existing ethernet device) but I was not able to bring the device up and running, when I start the device I get the same -15 error.
I've started looking at this again and I'm totally stumped as to why you would see this on Fedora (even F15) and not Ubuntu LTS. Can you tell me what kernel command line options you are using on Ubuntu? Even if these are the default it would be nice to know. It seems like there must be a fundamental difference between the two systems and I wonder if the kernel command-line options are different and make a difference.
Here is the kernel version and grub config (and still no problems on this box, even after reboots): [root@storage2 grub]# uname -a Linux storage2 2.6.38-8-server #41~lucid1-Ubuntu SMP Tue Apr 5 21:34:05 UTC 2011 x86_64 GNU/Linux [root@storage2 grub]# cat grub.cfg # # DO NOT EDIT THIS FILE # # It is automatically generated by grub-mkconfig using templates # from /etc/grub.d and settings from /etc/default/grub # ### BEGIN /etc/grub.d/00_header ### if [ -s $prefix/grubenv ]; then set have_grubenv=true load_env fi set default="0" if [ "${prev_saved_entry}" ]; then set saved_entry="${prev_saved_entry}" save_env saved_entry set prev_saved_entry= save_env prev_saved_entry set boot_once=true fi function savedefault { if [ -z "${boot_once}" ]; then saved_entry="${chosen}" save_env saved_entry fi } function recordfail { set recordfail=1 if [ -n "${have_grubenv}" ]; then if [ -z "${boot_once}" ]; then save_env recordfail; fi; fi } function load_video { insmod vbe insmod vga } insmod lvm insmod part_msdos insmod ext2 set root='(storage2-root)' search --no-floppy --fs-uuid --set ca10e991-0950-46ee-8c2f-65d2d3840215 if loadfont /usr/share/grub/unicode.pf2 ; then set gfxmode=640x480 load_video insmod gfxterm fi terminal_output gfxterm insmod part_msdos insmod ext2 set root='(hd0,msdos1)' search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f set locale_dir=($root)/grub/locale set lang=de insmod gettext if [ "${recordfail}" = 1 ]; then set timeout=-1 else set timeout=10 fi ### END /etc/grub.d/00_header ### ### BEGIN /etc/grub.d/05_debian_theme ### set menu_color_normal=white/black set menu_color_highlight=black/light-gray ### END /etc/grub.d/05_debian_theme ### ### BEGIN /etc/grub.d/10_linux ### menuentry 'Ubuntu, with Linux 2.6.38-8-server' --class ubuntu --class gnu-linux --class gnu --class os { recordfail insmod part_msdos insmod ext2 set root='(hd0,msdos1)' search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f linux /vmlinuz-2.6.38-8-server root=/dev/mapper/storage2-root ro quiet initrd /initrd.img-2.6.38-8-server } menuentry 'Ubuntu, with Linux 2.6.38-8-server (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os { recordfail insmod part_msdos insmod ext2 set root='(hd0,msdos1)' search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f echo 'Loading Linux 2.6.38-8-server ...' linux /vmlinuz-2.6.38-8-server root=/dev/mapper/storage2-root ro single echo 'Loading initial ramdisk ...' initrd /initrd.img-2.6.38-8-server } menuentry 'Ubuntu, with Linux 2.6.35-28-server' --class ubuntu --class gnu-linux --class gnu --class os { recordfail insmod part_msdos insmod ext2 set root='(hd0,msdos1)' search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f linux /vmlinuz-2.6.35-28-server root=/dev/mapper/storage2-root ro quiet initrd /initrd.img-2.6.35-28-server } menuentry 'Ubuntu, with Linux 2.6.35-28-server (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os { recordfail insmod part_msdos insmod ext2 set root='(hd0,msdos1)' search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f echo 'Loading Linux 2.6.35-28-server ...' linux /vmlinuz-2.6.35-28-server root=/dev/mapper/storage2-root ro single echo 'Loading initial ramdisk ...' initrd /initrd.img-2.6.35-28-server } menuentry 'Ubuntu, with Linux 2.6.32-30-server' --class ubuntu --class gnu-linux --class gnu --class os { recordfail insmod part_msdos insmod ext2 set root='(hd0,msdos1)' search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f linux /vmlinuz-2.6.32-30-server root=/dev/mapper/storage2-root ro quiet initrd /initrd.img-2.6.32-30-server } menuentry 'Ubuntu, with Linux 2.6.32-30-server (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os { recordfail insmod part_msdos insmod ext2 set root='(hd0,msdos1)' search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f echo 'Loading Linux 2.6.32-30-server ...' linux /vmlinuz-2.6.32-30-server root=/dev/mapper/storage2-root ro single echo 'Loading initial ramdisk ...' initrd /initrd.img-2.6.32-30-server } ### END /etc/grub.d/10_linux ### ### BEGIN /etc/grub.d/20_linux_xen ### ### END /etc/grub.d/20_linux_xen ### ### BEGIN /etc/grub.d/20_memtest86+ ### menuentry "Memory test (memtest86+)" { insmod part_msdos insmod ext2 set root='(hd0,msdos1)' search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f linux16 /memtest86+.bin } menuentry "Memory test (memtest86+, serial console 115200)" { insmod part_msdos insmod ext2 set root='(hd0,msdos1)' search --no-floppy --fs-uuid --set e052d247-5853-4f0f-97fb-c703e5456a9f linux16 /memtest86+.bin console=ttyS0,115200n8 } ### END /etc/grub.d/20_memtest86+ ### ### BEGIN /etc/grub.d/30_os-prober ### if [ "x${timeout}" != "x-1" ]; then if keystatus; then if keystatus --shift; then set timeout=-1 else set timeout=0 fi else if sleep --interruptible 3 ; then set timeout=0 fi fi fi ### END /etc/grub.d/30_os-prober ### ### BEGIN /etc/grub.d/40_custom ### # This file provides an easy way to add custom menu entries. Simply type the # menu entries you want to add after this comment. Be careful not to change # the 'exec tail' line above. ### END /etc/grub.d/40_custom ### ### BEGIN /etc/grub.d/41_custom ### if [ -f $prefix/custom.cfg ]; then source $prefix/custom.cfg; fi ### END /etc/grub.d/41_custom ### [root@storage2 grub]#
Sorry for butting in, but I ran saw this problem while looking for something else. It looks like the error is IXGBE_ERR_RESET_FAILED, which gets returned from ixgbe_reset_hw_82598 if the adapter doesn't reset properly. Properly, in that code is defined as not happening within 10 iterations of a loop that does a udelay(1), so effectively, within 10 microseconds. I don't have an ubuntu source tree handy, but I wouldn't be at all suprised if canonical didn't just extend that loop by a few iterations, or make the timeout just a touch longer. HTH
What you have described appears to be exactly the problem, Neil. What is so interesting is there are no differences between those two functions in the source trees (I checked in the past and I just checked again today).
Thomas, thanks for sending that kernel command line information. Can you add pcie_aspm=off to the kernel command line for your Fedora installation? Ubuntu does not disable ASPM and there have been some known ASPM issues with some systems that produce odd results.
(In reply to comment #10) > Thomas, thanks for sending that kernel command line information. > > Can you add pcie_aspm=off to the kernel command line for your Fedora > installation? Ubuntu does not disable ASPM and there have been some known ASPM > issues with some systems that produce odd results. Sorry, the last sentence should read: "Ubuntu does not enable ASPM and there have been some known ASPM issues with some systems that produce odd results."
Hi Andy, unfortunately I don't have the box available at the moment, but I will have an equal box available for testing guess within the next two months. So as soon as I have the box ready I'll do the test with ASPM off to see if it make any difference. Results will be posted as a comment to this bug, so you should get a notification when I'm done with it.
Sounds good, Thomas. I'm happy to help you out when the system is available. I saw this your lshw output: description: Computer product: X7DB8 () vendor: Supermicro version: 0123456789 serial: 0123456789 width: 64 bits capabilities: smbios-2.5 dmi-2.5 vsyscall64 vsyscall32 Does that mean this was a 'whitebox' with Supermicro motherboard or was this a system that your purchased from a system vendor. I ask because if it was a system you purchased rather than built we might have one here I can try.
The system is a storage box build by Thomas Krenn (http://www.thomas-krenn.com/de/storage-loesungen/storage-systeme/thomas-krenn-storage/3he-intel-dual-cpu-sc836-storage.html), don't know if the components are still the same because the box I have is from 2008 or so. The 10Gb card was not shipped with the box, we've attached it later on to the storage box.
OK, thanks. We definitely don't have one of those. :-)
I wonder if the inability to properly bring up the device has anything to do with a pci quirk that Ubuntu is carrying, but we are not. It appears that the 82598 is behind this bridge chip: *-pci:3 description: PCI bridge product: 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 vendor: Intel Corporation physical id: 1c bus info: pci@0000:00:1c.0 version: 09 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:67 ioport:4000(size=4096) memory:d8800000-d88fffff ioport:d8d00000(size=2097152) which looks like is has the pci id 8086:2690.
Looks like my assertion above was not the case, the only quirk being carried by Ubuntu that does not appear in F15 kernel is this: #if defined(CONFIG_DMAR) || defined(CONFIG_INTR_REMAP) #define VTUNCERRMSK_REG 0x1ac #define VTD_MSK_SPEC_ERRORS (1 << 31) /* * This is a quirk for masking vt-d spec defined errors to platform error * handling logic. With out this, platforms using Intel 7500, 5500 chipsets * (and the derivative chipsets like X58 etc) seem to generate NMI/SMI (based * on the RAS config settings of the platform) when a vt-d fault happens. * The resulting SMI caused the system to hang. * * VT-d spec related errors are already handled by the VT-d OS code, so no * need to report the same error through other channels. */ static void vtd_mask_spec_errors(struct pci_dev *dev) { u32 word; pci_read_config_dword(dev, VTUNCERRMSK_REG, &word); pci_write_config_dword(dev, VTUNCERRMSK_REG, word | VTD_MSK_SPEC_ERRORS); } DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x342e, vtd_mask_spec_errors); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x3c28, vtd_mask_spec_errors); #endif
I have a feeling this will be resolved if you boot the system with ASPM disabled on the kernel command-line. I'm going to go ahead and close this bug, but please reopen if booting with 'pcie_aspm=off' does not allow you to bring up your adapter properly in Fedora.