Bug 475625

Summary: [Intel 5.4 bug] ixgbe does not work reliably with 16 or more cores
Product: Red Hat Enterprise Linux 5 Reporter: Mark Wagner <mwagner>
Component: kernelAssignee: Andy Gospodarek <agospoda>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: abdulkh, agospoda, ahecox, akent, andriusb, anton.fang, chaohong.guo, cward, dmair, dshaks, jane.lv, jesse.brandeburg, john.ronciak, jtluka, jvillalo, keve.a.gabbert, luyu, mgahagan, peterm, qcai, rpacheco, Stuart.Kirk, syeghiay, tao, vanhoof
Target Milestone: rcKeywords: ZStream
Target Release: 5.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 08:14:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 452016, 475528, 480792, 483210, 483701, 483784, 485920    
Attachments:
Description Flags
requested output of /proc/interrupts in the various states
none
ixgbe-ring-setting-working.patch none

Description Mark Wagner 2008-12-09 20:37:11 UTC
Description of problem:

The ixgbe driver does not work reliably when Hyper-Threads are enabled on the Nehalem box.
I can ping other boxes but netperf runs to or from those boxes does not work consistently. Sometimes they succeed, sometimes they fail to establish a connection. 

If I turn hyper-threads off in the BIOS (SMT option) there is no problem with netperf and other tests. 

I can also get this to work w/ hyper threads if I boot with the pci=nomsi option. 

I have included some cpuinfo data below which shows that with hyper-threads enabled every core reports a "core id" of 0. 


Version-Release number of selected component (if applicable):
This is with RHEL5.3 Snap5 (although it was seem but not fully diagnosed on earlier versions)


How reproducible:

Every time

Steps to Reproduce:
1. Enable hyper threads in bios
2. Boot OS 
3. try to run netperf
  
Actual results:


Expected results:


Additional info:

With hyper threads enabled, it looks like the cpuinfo data is messed up.

[root@perf22 ~]# cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5337.57
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.43
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.48
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.48
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 4
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.48
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 5
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 5
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.49
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 6
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.51
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.46
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 8
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 1
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 16
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.55
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 9
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 1
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 17
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.56
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 10
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 1
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 18
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.54
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 11
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 1
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 19
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.55
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 12
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 1
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 20
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.55
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 13
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 1
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 21
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.54
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 14
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 1
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 22
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.52
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 15
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.886
cache size      : 32 KB
physical id     : 1
siblings        : 8
core id         : 0
cpu cores       : 1
apicid          : 23
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.59
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:



w/o hyper threads
[root@perf22 np2.4]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.878
cache size      : 32 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 1
apicid          : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5337.55
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.878
cache size      : 32 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 1
apicid          : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.49
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.878
cache size      : 32 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 1
apicid          : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.49
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.878
cache size      : 32 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 1
apicid          : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.44
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 4
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.878
cache size      : 32 KB
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 1
apicid          : 16
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.56
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 5
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.878
cache size      : 32 KB
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 1
apicid          : 18
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.53
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 6
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.878
cache size      : 32 KB
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 1
apicid          : 20
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.56
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Genuine Intel(R) CPU             0000 @ .
stepping        : 4
cpu MHz         : 2666.878
cache size      : 32 KB
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 1
apicid          : 22
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips        : 5333.51
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Comment 1 Mark Wagner 2008-12-10 14:43:49 UTC
Tested on the same box with an upstream kernel that mr. gospo built ( 2.6.28-rc7  ).  With this kernel the netperf tests worked w/o a hitch.

Comment 2 Ronald Pacheco 2008-12-10 15:38:21 UTC
John R, John V and Keve,

We need your assistance in testing this ASAP.  Thanks!

Comment 3 Ronald Pacheco 2008-12-10 15:39:13 UTC
John R, John V and Keve,

We need your assistance in testing this ASAP.  Thanks!

Comment 4 John Ronciak 2008-12-10 16:24:06 UTC
What is the output from 'cat /proc/interrupts when it's working and when it not?  If it pings in all cases that means the driver is doing the right thing but that the system might not be handling the interrupts correctly, especially regarding the nomsi option making it work.  The out put should show how things are plumbed.

Comment 5 Mark Wagner 2008-12-10 20:30:50 UTC
Updated the Bios to the latest version issue still remains:

# dmidecode 2.7
SMBIOS 2.4 present.
82 structures occupying 4105 bytes.
Table at 0xBF41A018.

Handle 0x0000, DMI type 0, 24 bytes.
BIOS Information
        Vendor: American Megatrends Inc.
        Version: 4.6.3
        Release Date: 11/07/2008
        Address: 0xF0000
        Runtime Size: 64 kB
        ROM Size: 1024 kB
        Characteristics:
                PCI is supported
                PNP is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                ESCD support is available
                Boot from CD is supported
                Selectable boot is supported
                BIOS ROM is socketed
                EDD is supported
                5.25"/1.2 MB floppy services are supported (int 13h)
                3.5"/720 KB floppy services are supported (int 13h)
                3.5"/2.88 MB floppy services are supported (int 13h)
                Print screen service is supported (int 5h)
                8042 keyboard services are supported (int 9h)
                Serial services are supported (int 14h)
                Printer services are supported (int 17h)
                CGA/mono video services are supported (int 10h)
                ACPI is supported
                LS-120 boot is supported
                ATAPI Zip drive boot is supported
                BIOS boot specification is supported
                Targeted content distribution is supported
        BIOS Revision: 4.6

Handle 0x0001, DMI type 1, 27 bytes.
System Information
        Manufacturer: Supermicro
        Product Name: X8DTN
        Version: 1234567890
        Serial Number: 1234567890
        UUID: 00020003-0004-0005-0006-000700080009
        Wake-up Type: Power Switch
        SKU Number: 1234567890
        Family: 1234567890


At this point, it doesn't look ping is working either 
[root@perf22 np2.4]# ethtool eth6
Settings for eth6:
        Supported ports: [ FIBRE ]
        Supported link modes:   
        Supports auto-negotiation: No
        Advertised link modes:  10000baseT/Full 
        Advertised auto-negotiation: No
        Speed: 10000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 0
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000007 (7)
        Link detected: yes

Actually, can't ping now either....

[root@perf22 np2.4]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
172.17.10.0     *               255.255.255.0   U     0      0        0 eth6
192.168.2.0     *               255.255.255.0   U     0      0        0 eth4
192.168.1.0     *               255.255.255.0   U     0      0        0 eth0
10.16.40.0      *               255.255.248.0   U     0      0        0 eth3
169.254.0.0     *               255.255.0.0     U     0      0        0 eth6
default         10.16.47.254    0.0.0.0         UG    0      0        0 eth3
[root@perf22 np2.4]# ping 172.17.10.100
PING 172.17.10.100 (172.17.10.100) 56(84) bytes of data.

--- 172.17.10.100 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 3999ms


add pci=msi on boot line and reboot box
-------------------------------------------

[root@perf22 np2.4]# ./netserver
Starting netserver at port 12865
Starting netserver at hostname 0.0.0.0 port 12865 and family AF_UNSPEC
[root@perf22 np2.4]# ping 172.17.10.100
PING 172.17.10.100 (172.17.10.100) 56(84) bytes of data.
64 bytes from 172.17.10.100: icmp_seq=1 ttl=64 time=1319 ms
64 bytes from 172.17.10.100: icmp_seq=2 ttl=64 time=319 ms

--- 172.17.10.100 ping statistics ---
3 packets transmitted, 2 received, 33% packet loss, time 1999ms
rtt min/avg/max/mdev = 319.257/819.138/1319.019/499.881 ms, pipe 2
[root@perf22 np2.4]# ./netperf -P0 -l10 -H 172.17.10.100 -D
Interim result: 9762.83 10^6bits/s over 1.01 seconds
Interim result: 9729.19 10^6bits/s over 1.00 seconds
Interim result: 9752.82 10^6bits/s over 1.00 seconds
Interim result: 9764.59 10^6bits/s over 1.00 seconds
Interim result: 9783.20 10^6bits/s over 1.00 seconds
Interim result: 9816.72 10^6bits/s over 1.00 seconds
Interim result: 9808.05 10^6bits/s over 1.00 seconds
Interim result: 5153.57 10^6bits/s over 1.90 seconds
Interim result: 9772.44 10^6bits/s over 1.00 seconds
 87380  16384  16384    10.00    8891.42

Comment 6 Mark Wagner 2008-12-10 20:31:58 UTC
Created attachment 326542 [details]
requested output of /proc/interrupts in the various states

John,

Here is the data you requested

Comment 7 John Ronciak 2008-12-10 21:38:28 UTC
So what interfaces are being used in the tests that fail?  Even in the failure case the interrupts are still being processed by the driver.  So it looks like the driver (and HW) are working, just that the tests aren't running correct?  Is this some sort of problem with the irqbalencer or something like that which is causing the test to fail?  What is the exact problem when the test does fail?  You state that the test cannot establish a connection but if you retry running the test does it work?  Does doing a 'service network restart' get things running after it fails?

Comment 8 Mark Wagner 2008-12-10 23:10:12 UTC
Can Intel let us know if they have verified that this a tested, know good configuration?
(ixgbe, RHEL5.3, Nehalem w/ SMT enabled)

Comment 9 John Ronciak 2008-12-10 23:30:36 UTC
Who specifically are you asking this question to?  I have no idea.  Who from the platform group is working this?

Is anybody going to answer the questions I asked in #7?

Comment 10 Mark Wagner 2008-12-11 02:52:50 UTC
Reply to comment #7

irqbalance was not used in these tests. It is typically disabled in my performance runs in order to optimize L2 cache settings by hand. Thus all of the interrupts occur on core0 by default. However, I have verified that the issue is still present when irqbalance is on.

The interface is eth6.  

When using the boot option pci=nomsi, only a single Q gets created and things just work. 

The main issue is intermittent behavior. pings typically work. The default netperf TCP_STREAM test sometimes does, sometimes doesn't.  The fact that I can boot the system with a different kernel or boot option and have it work with seem to indicate that it is something specific to the default MSI-X interrupts and the use of hyper-threads. 

Several more new datapoints, I can run a netperf UDP_STREAM and it works, follow that immediately with a netperf TCP_STREAM which fails. 

Also, I have observed several instances where all traffic (pings, netperf, etc) seem to stop working. service network restart seems to help in that situation but a reboot is always safer...

The common denominator seems to be MSI-X interrupts and Hyper-threads.

Comment 11 John Ronciak 2008-12-12 23:11:42 UTC
So firstly, I'm not sure how the BIOS version used here (4.6.3) relates to the ones being used here as the ones here have numbers like 85. So I hope the BIOS you are using has the same options. 

MSI-X and HT is working on our systems here with this configuration.

That said I did get a BKM from one of guys testing performance.  The BKM brought both stability and much better performace.

Note, I'm just the messenger here, I've not done any of this testing. 

Here is the BKM:
Nehalem BKM setup in BIOS for network I/O performance

General BIOS settings
·         Memory-config-thermal throttling - turn that off, disable both close loop and open loop
·         All C state options disabled, C3, C1E etc.
·         Uncore Package Cstate = C0, used to be C6.
·         Disable GV3 - speedstep

Memory configuration
·         Turn on NUMA
·         Cache line interleave not set

If you want to turn off HT
·         Disable SMT 
·         Enable Physical APIC (physical apic is available in bios 22 and up, earlier bios requires disabling cores to get cpu count under 6) 

Others
·         On board network was disabled
·         SATA set to AHCI 
·         64-bit distro is best (a number of the 32-bit distros run into a nasty apic issue) 

Getting Linux 2.6.18 kernel from kernel.org to work…
1.       Set the BIOS SATA configuration to AHCI
2.       Copy a working .config from a default kernel installation e.g. RH EL 5 into the "kernel source" directory. 
3.       Run  "make menuconfig" from the directory containing the sources and  enable SATA, AHCI. This is found by navigating through the following options/suboptions -   Device drivers, scsi device support,  scsi low level drivers, serial ATA (SATA) support. Enable AHCI SATA . 
4.       Go to a default RH install and look for the numeric ID of the SATA controller, use lspci to determine if the SATA controller is visible, and then use lspci -n | grep "numeric ID" from lspci to extract the device ID
5.       Edit <kernel>/drivers/scsi/ahci.c
6.       Look for ahci_pci_tbl
7.       Create a new entry in this table for an Intel driver with device ID extracted from step above (step 3).

For 2.6.19(seems to work with B1 configurations)
·         Enable AHCI for SATA in NH BIOS
·         Boot to “Vanilla” kernel.
·         Make Oldconfig – choose all defaults.
·         Make Menuconfig – Verify AHCI module enabled in Device Drivers > Serial ATA Production and Experimental > AHCI Support.
·         Change the ahci.c file in /drivers/ata.  Add the Device ID generated from the 2.6.18 procedure listed above.  Once completed, you can follow normal kernel installation procedures.

For 2.6.24 (seems to work with B1 configurations)
·         Install RH with AHCI support enabled in BIOS
·         Run make menu  oldconfig 
·         Build the kernel

Comment 12 Andy Gospodarek 2008-12-16 18:05:25 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Intermittent loss of traffic has been seen on some larger, multi-core Intel systems using Hyper-Threading Technology (SMT) when using the ixgbe driver and MSI-X.  A fix for this issue will be released shortly after the general availability of RHEL5.3.

Comment 13 John Ronciak 2008-12-16 18:11:46 UTC
Andy,

A fix to what?  I don't know of anything being changed in the ixgbe driver.  Please specify what is going to be fixed.  Thanks.

Comment 14 Andy Gospodarek 2008-12-17 17:09:58 UTC
John, presumably there is something we can do in our driver to resolve this issue.  I hope to figure out what is wrong sometime soon.  I've made some progress yesterday, but I need to reliably reproduce the issue before I can be sure that my fix does anything.

Comment 15 John Ronciak 2008-12-17 17:30:33 UTC
Andy,

As far as we know this is an OS/HW/BIOS problem and has nothing to do with the driver.  Once we did the BKM items listed above, our testing has been working fine.  Now we have not been running 5.3 snap shots on the systems as they have been in use on other things.  I'll see if we can get one freed up to run snap6 on it.  I really don't think this has anything to do with the ixgbe driver.

Comment 16 Andy Gospodarek 2008-12-17 17:59:16 UTC
John, if that's true that would be awesome.  We can drop these from the release notes if it doesn't seem to be a driver issue.

Comment 17 Andy Gospodarek 2008-12-17 19:08:30 UTC
Dropping release note entry as it seems from comment #13 and comment #15 that John feels this is not a software issue.

We can add some release notes back if John's team comes back from testing snap 6 and report similar issues to what we've seen.

Comment 18 Andy Gospodarek 2008-12-17 19:08:30 UTC
Deleted Release Notes Contents.

Old Contents:
Intermittent loss of traffic has been seen on some larger, multi-core Intel systems using Hyper-Threading Technology (SMT) when using the ixgbe driver and MSI-X.  A fix for this issue will be released shortly after the general availability of RHEL5.3.

Comment 19 John Ronciak 2008-12-18 05:15:52 UTC
OK we are going to get a system setup to test tomorrow.  In the mean time could you guys please check on a couple of things.

Check that there is patch patch in the ixgbe driver which is simple (one line one).  In ixgbe.h, there’s a variable named v_idx.  If it’s declared as a u8, make it a u16 and recompile.

Also, there were some upstream MSI-X leaks with some of the interrupt code.  Could any of these have been backported to 5.3 without the subsequent corrections to them?  A leak like this could account for why it works sometimes and not others.

Comment 20 Andy Gospodarek 2008-12-18 16:19:10 UTC
John, thanks for the suggestions.  We still had v_idx as a u8, so I could see this as a problem on our system since we don't go see issues when HT is off and there are only 8 cpus, but we do see problems when HT is on and there are 16.

Can you elaborate on the patches upstream that resolves the MSI-X problems?  We
are shipping version 1.3.18-k4 and took up through this upstream commit:

commit 15e79f24b60c4b0bf8019423bda4e03a576b02f2
Author: Andy Gospodarek <andy>
Date:   Wed Aug 27 18:04:32 2008 -0700

    ixgbe: initialize interrupt throttle rate

Comment 21 John Ronciak 2008-12-18 17:33:33 UTC
OK we are going to get a system setup to test tomorrow.  In the mean time could you guys please check on a couple of things.

Check that there is patch patch in the ixgbe driver which is simple (one line one).  In ixgbe.h, there’s a variable named v_idx.  If it’s declared as a u8, make it a u16 and recompile.

Also, there were some upstream MSI-X leaks with some of the interrupt code.  Could any of these have been backported to 5.3 without the subsequent corrections to them?  A leak like this could account for why it works sometimes and not others.

Comment 22 John Ronciak 2008-12-18 17:48:08 UTC
I have no idea why my comments got added twice.  :-(

Anyway, OK so you are going to need to add the u8 -> u16 change.

The MSI-X leak problem was not in our driver at all.  There were kernel changes to the MSI-X interrupt code that initially caused vectors to be orphaned in the system.  This created out of resource conditions when drivers would ask for MSI-X interrupt vectors.  So you'll have ot look around to see if any of the MSI-X code in RHEL5 picked up any of those MSI-X changes.  If so, you'll need to check that you have all of the latest changes so that it works correctly and doesn't orphan vectors.  Sorry I don't have the specific patches since this stuff was not really driver related.

Comment 24 Mark Wagner 2009-01-05 20:23:20 UTC
I have been able to get some test time on a Caneland and Dunington with the CX-4 version of the ixgbe card.  I saw similar symptoms to what I'm seeing on the Nehalem box.  

So as the Caneland is a 16 core and the Dunington is 24 core, the issue would appear to be more related to the actual number of cores.  It also means that this is not an issue specific to hyper-threads or Nehalem.

Comment 25 John Ronciak 2009-01-05 22:10:15 UTC
We have been able to reproduce this here on our systems as well.  We are not sure why this is happening however.  There are also differences when we run on the latest upstream kernels as well.  We are debugging this trying to figure this out and if it has something to do with the ixgbe driver.  What points away from the driver is that it's acting different with different OS versions.

In the mean time I think that you need to keep resetting (rebooting?) until it's working then do the needed runs of the tests.  This way it won't stop progress as we work the problem.

I'll post more as soon as we know more.

Comment 26 Ronald Pacheco 2009-01-06 18:09:05 UTC
Mark,

Can you update the description to reflect what we believe the problem really is?

Comment 27 Mark Wagner 2009-01-06 18:15:17 UTC
Update description to say 16 or more cores.  Not sure where the "breaking point" is.  It works with 8 cores but not 16 or more.

Comment 28 John Ronciak 2009-01-07 23:10:20 UTC
OK, we just checked the RC1 code for the ixgbe driver and there is a patch missing.  For some reason it's not here in this BZ but we talked about it on one of the phone calls.  In ixgbe.h the v_idx is defined as a u8 and it must be a u16.  This has to be done or this isn't going to work right.  The vector index has to be a u16 to work correctly this this number of vectors.

Can you guys get this added and rerun?  We are setting up to make the mod and rerun our testing but it's going to take us a bit to get set up to build the driver.  We are confident that this is problem.

Comment 29 Andy Gospodarek 2009-01-07 23:37:45 UTC
John, I tried our systems with that patch and we noticed no improvement.  I hope you notice otherwise.

Additionally the upstream changelog entry doesn't explicitly state that there is any improvement for frame reception:

commit ff819cfb5d95c4945811f5e33aa57274885c7527
Author: Jesse Brandeburg <jesse.brandeburg>
Date:   Thu Sep 11 19:58:29 2008 -0700

    ixgbe: fix bug with lots of tx queues
    
    when using more than 8 tx queues you can overrun the 8 bit v_idx
    field, so change it to 16 bits to represent the maximum number
    of queues (one for each bit)
    
so this seemed to verify what I saw with my testing.

Comment 30 John Ronciak 2009-01-08 15:51:08 UTC
Andy, when you say "no improvement" do you mean that the system still didn't always get the same number of MSI-X interrupts from boot to boot or something else like performance?  This fix is not really for performance reasons, it's to be able to index the correct number of interrupts based on the number of queues being used.  

Bottom line is that the u8 index is not large enough to store the correct number of MSI-X interrupt vectors.  So this fix has to be there.  We may find another problem but this one does have to be fixed.

I'll respond back later today once we have the testing results.

Comment 31 Andy Gospodarek 2009-01-08 16:30:06 UTC
Sorry I wasn't clear, John.  By "no improvement," I meant that traffic
was still not passed reliably.

I also reduced the queues down to less than one per CPU (I even cut it
down to only one receive queue) and still had the same problems if I
recall.

Comment 32 John Ronciak 2009-01-08 23:45:24 UTC
OK so here is what the ixgbe team (namely PJ Waskievicz) found:

Analysis shows, for whatever reason, the OS/platform is inconsistent in assigning MSI-X vectors to ixgbe on load.  We can show that the number of vectors allocated is 13, 15, then 17, per driver instance, on each reload of the driver.  i.e. Load the driver, request 17 vectors, we get 13.  Reload the driver (rmmod; insmod), we request 17, we get 15, etc.  Once we get 17 vectors, the driver functions.  That is because we have a 1:1 mapping of Rx queue to MSI-X vector, and all the Rx vectors are now getting cleaned successfully.

Now ixgbe wants to configure the same number of Rx queues as there are CPUs in the platform.  The obvious problem with this is when we get less than the number of vectors we require to have a 1:1 mapping.  82598 can only support 16 queue vectors total (EIMS and friends restriction), and then up to 2 additional vectors for link status change and other causes.  There needs to be logic to stack queues onto the same vector.  This is where the rxr_count and txr_count comes into play in the q_vector structs.  These bitmaps are representative of the queues that are assigned to this vector (set in map_rings_to_vectors).

The missing piece in the RHEL5.3 ixgbe driver is the Rx NAPI clean routine to handle multiple queues on a single vector.  In the upstream kernel, this function is named ixgbe_clean_rxonly_many().  You need this function.  Also note that the poll routine is switched in ixgbe_napi_enable_all() if rxr_count is greater than 1.  That way there’s two poll routines servicing NAPI Rx cleanup.  This can’t be done this way in the RHEL5.3 driver since the poll routine is determined when the NAPI dummy_netdev is registered.  So it’ll need to be moved to ixgbe_napi_add_all().
------------------------------------

So the routine needs to be added and the poll routine call needs to be moved.

Let us know if you have any problems.

Comment 33 Andy Gospodarek 2009-01-12 18:01:31 UTC
Thanks for the update, John, and thank PJ for the work debugging this.  I saw that patch upstream and thought it would be helpful, but wasn't exactly sure why until now.  I'll work on getting those bits backported and see if we can get some testing going.

Comment 34 John Villalovos 2009-01-13 19:16:22 UTC
FYI: This bug affects the Tylersburg-EP systems.  Our new Nehalem based system.

It also affects Caneland and Dunnington systems.

Comment 35 Andy Gospodarek 2009-01-13 19:26:20 UTC
I have added this patch:

http://people.redhat.com/agospoda/rhel5/0122-ixgbe-fixes-to-increase-reliability-on-systems-usin.patch

to my test kernels located here.

http://people.redhat.com/agospoda/#rhel5

I'm doing some testing now, but any other help testing them is always appreciated.

Comment 36 John Ronciak 2009-01-13 23:50:07 UTC
OK so we were able to run our basic acceptance tests (BAT) on the patch and everything looks fine.  The BAT ran after each reboot without exception.  We are now loading Andy's pre-built kernel to test that as well.  The system used to the bat testing is now running a stress test overnight.  I'll report the results in the morning.

Good job Andy!  The part we were having a problem backporting was the breaking up of the NAPI credit.  So it looks like you did it right.  :-)

Comment 37 Andy Gospodarek 2009-01-14 16:23:16 UTC
Thanks for the testing and compliments, John.  Please let us know when you've got some results from these tests so I can either figure out what is wrong and re-spin or get to work adding this into the appropriate spot in RHEL5.

Comment 38 John Ronciak 2009-01-14 17:07:56 UTC
Ok so testing went pretty well, both stress and BAT passed.  We did find one issue.  It's when you bring up the interface and change the RX ring size.  It won't pass traffic any more.  Here are the details:

If we change receive ring size, we cannot pass traffic. Trying to pass traffic causes NETDEV timeouts. 

The issue occurs only when less than 16 MSI-X vectors are allocated. So no issue after driver is reloaded three time where all MSI-X vectors are allocted but not the first two loads(when there's only 10 and 13 vectors). Sourceforge driver (1.3.47) is works as expected.

So to repro, bring up interface after reboot, change ring size by "ethtool ethX -G rx 64", notice that ping fails. And trying to ftp a file causes NETDEV timeouts. 

Interestingly setting rx=512 is ok but not 128 or 64. Also I couldn't repro changing tx ring size.
------------------------------

So this is a minor problem compared to what it was but maybe it can still be looked at.

Comment 39 John Ronciak 2009-01-14 17:18:29 UTC
From PJ:

Could it be you the patch is not memcpy'ing the rxr_count and rxr_idx portions of the Rx ring over (so then the multiple queue count on the one vector is lost)?

This would account for the problem.  We are looking at the patch to check this but we have a group wide meeting that runs for the rest of the morning.

Comment 40 Andy Gospodarek 2009-01-14 20:49:26 UTC
I don't think that's our issue here.  The version of the driver we are using doesn't free and re-allocate new rxring space like the current upstream driver does.  

It's also puzzling that the number of buffers would matter.  Your testing indicates that 128 and 64 are broken but 512 is not, that also seems odd.

Comment 41 Andy Gospodarek 2009-01-14 21:04:05 UTC
Just as a reference, we are not carrying this patch:

commit c431f97ef96026e6da7032a871a0789cf5a2eaea
Author: Jesse Brandeburg <jesse.brandeburg>
Date:   Thu Sep 11 19:59:16 2008 -0700

    ixgbe: fix ring reallocation in ethtool
    
    changing ring sizes in ethtool needs to be robust.  If an allocation fails the
    driver must continue operation, with the previous settings.

Though it might be nice to have, I don't think it's the issue right now (unless you are seeing failures in the output of dmesg indicating that ixgbe_setup_rx_resources is failing.

Comment 42 Andy Gospodarek 2009-01-14 21:12:42 UTC
not carrying this one either (which would obviously be necessary for the patch in comment #41 to work).

commit d3fa4721456226d77475181a4bfbe5b3d899d65c
Author: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr>
Date:   Fri Dec 26 01:36:33 2008 -0800

    ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
    
    The adapter rings are kcalloc()'d, but in set_ringparam() in ixgbe_ethtool,
    we replace that memory from the vmalloc() pool.  This can result in a NULL
    pointer reference when trying to modify the rings at a later time, or on
    device removal.

Comment 43 John Ronciak 2009-01-14 21:28:57 UTC
For comment 40, but only when the driver doesn't get all the vectors.  So that has something to do it as well.

We are looking at the other 2 patches you point out.  We'll let you know later today.

Comment 44 John Ronciak 2009-01-15 00:04:36 UTC
After the team looked into this we think that the first one is the one will get closer, but applying both is necessary.  The second one fixes a bug in the first one, plus it maintains all the NAPI infrastructure properly.  We are talking about #41 and #42 above to be clear.

Comment 45 Andy Gospodarek 2009-01-15 00:58:20 UTC
I can do that I guess, but I didn't immediately see why it would help.  I understand why both are needed, and I think there was a third on in there too that is needed to get us to today's functionality in Linus' tree.  The problem I have with all this is that I don't have direct physical access to a box with this hardware (it's up in a perf lab in Boston), so it takes me more time to get tests going since I need to make sure the box isn't being used for some other tests.

Comment 46 John Ronciak 2009-01-15 17:32:51 UTC
If you generate the patches we can test it.  We have the set up now so I don't think it would take much for us to test them.

Comment 47 Andy Gospodarek 2009-01-19 21:49:52 UTC
Created attachment 329400 [details]
ixgbe-ring-setting-working.patch

It took a bit, but I got these patches (as well as a few other needed bits) added to ixgbe and ring size setting now appears to be working.  I'll add this to my test kernels and post here when they are available.

Comment 48 John Ronciak 2009-01-19 22:46:28 UTC
We are pulling down the patches and will start testing hopefully today.  We'll let you know how it goes.

Comment 49 John Ronciak 2009-01-20 18:43:34 UTC
It was very disappointing to hear today that none of the work we have done got included into 5.3.  The document statement that you guys have to load the Linux kernel with the "nomsi" option is probably not the right thing to do.  A much more reasonable work around would be to load the ixgbe driver with an option of "RSS=8,8" so that each port only uses 8 MSI-X vectors.  This will prevent the tanking of networking performance on the large systems.

Please see to it that the documented work-around is the RSS=8,8 driver load option and not the nomsi kernel option.

Comment 50 Andy Gospodarek 2009-01-20 19:23:53 UTC
John, there are no parameters for ixgbe in the RHEL or upstream kernel.  I just checked your sourceforge driver and see that you have module options available there.  I didn't even realize that was the case.  If you guys could push those upstream that would be great.

As soon as we can conclude that the patch in comment #47 and http://people.redhat.com/agospoda/rhel5/0122-ixgbe-fixes-to-increase-reliability-on-systems-usin.patch resolve this issue, I'm going to do my best to get them added to the errata kernel stream.  Getting this fix out before 5.4 is important to me and our customers.  These patches won't be in the installer kernel for 5.3, but they will there during one of the first async updates.

Comment 51 John Ronciak 2009-01-20 20:07:32 UTC
We can't.  The upstream guys had us pull all of the module stuff out of the upstream driver.  They don't want them there at all.

I'll get you testing info later today on the patch testing.

Comment 52 John Ronciak 2009-01-21 19:32:06 UTC
Both of the issues are gone now. We were able to pass traffic and set ring sizes regardless of the # of MSI-X vectors. It passed the stress test now as well.

Only a minor issue was noticed, some debug statements have been included by accident. Changing rx ring size will say something like this:

New values for ring counts will be: rx: 128 tx 64
Values for ring counts from ethtool are: rx: 128 tx: 64

So this looks like it's coming from ixgbe_ethtool.c. Search for KERN_CRIT

Comment 53 Andy Gospodarek 2009-01-21 19:48:57 UTC
Those did sneak in, I'll drop them.

Thanks for the testing and feedback by your team, John!

Comment 55 RHEL Program Management 2009-01-27 20:41:08 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 56 Andy Gospodarek 2009-01-27 22:25:10 UTC
My test kernels have been updated to include a patch for this bugzilla.

http://people.redhat.com/agospoda/#rhel5

Please test them and report back your results.  Without immediate
feedback there is a good chance this or any other fix for this driver
will not be included in the upcoming update.

Comment 57 John Ronciak 2009-01-28 21:50:00 UTC
Andy,

Driver is good now. The rx ring param issues and also the debug statements that got accidentally included last time are fixed.

While testing the patch we did noticed something. This issue goes back since RHEL5.2. The driver lets mtu be set < 68 and stops traffic. To resume traffic we have to either ifdown/up or set it back to >= 68. This fix made into both source forge driver and upstream(http://markmail.org/message/krpinukm4l6rs7sx).  So this is really another fix but it's probably not very critical that it get added, at least not for the errata and z-stream release.

Comment 58 Andy Gospodarek 2009-01-28 22:21:21 UTC
Thanks for that top, John.  I'll add that to my test kernels and it should creep into the next update.

Comment 62 Don Zickus 2009-02-02 19:47:53 UTC
in kernel-2.6.18-130.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 63 John Ronciak 2009-02-09 17:35:16 UTC
OK everything works as before respect to the MSI-X allocation and ring size changes. 

However, there's a problem with ethtool reporting incorrect rx_bytes. And the problem's been there since we started looking at the MSI-X issues. Driver reports 2x actual traffic. E.g. Receiving 30MB file will show 60MB received according ethtool stats. tx_bytes is reported correctly.   Sorry we missed this earlier.

Here's steps to repro. mput & mget same sized file through ftp. Check "ethtool -S ethX | grep bytes" see that rx_bytes is twice that of tx_bytes. 

Setting maxcpu=8 did not make a difference. Our SF(1.3.47) driver works OK.  So there seems to be another patch that was missed.

Comment 64 Andy Gospodarek 2009-02-09 18:17:07 UTC
Thanks, John, I'll take a look.

Comment 65 Andrius Benokraitis 2009-02-10 16:59:15 UTC
*** Bug 481669 has been marked as a duplicate of this bug. ***

Comment 66 Andy Gospodarek 2009-02-10 17:06:55 UTC
Does anyone at RH or Intel mind if we open this up for public consumption or a a minimum to a mutual partner?

Comment 67 John Ronciak 2009-02-10 17:22:16 UTC
Since 5.3 released the partners either know it's broke already or soon will know.  So it's fine by us to open this up to them.

Comment 69 RHEL Program Management 2009-02-16 15:16:12 UTC
Updating PM score.

Comment 70 Jesse Brandeburg 2009-02-18 01:38:37 UTC
I was asked to respond to this bug by a customer, the research and patches proposed in this bug seem correct to me.  John and PJ from our Linux team were the primary inputters to this bug.  The v_idx fix mentioned in comment #29 was the originating cause of this bug with many cores, and #41, #42 are both because there is no clear process for including all upstream patches in each RHEL release (and these fixes were needed)

Comment 71 Andy Gospodarek 2009-02-18 04:22:16 UTC
(In reply to comment #70)
> because there is no clear process for including all upstream patches in each
> RHEL release (and these fixes were needed)

I don't want to sound too defensive, but we take major driver updates for RHEL releases all the time.  During our normal Update process (from 5.3 to 5.4, for example), we will move forward to a brand new driver version from upstream.

We are only taking the smaller set of patches for an in-update (still considered 5.3) kernel-fix that should roll out soon -- long before 5.4.  We try not to take large fixes for these updates because we don't want to introduce too much instability without longer test cycles.

We can try and make it priority to explain this a bit better in one of our calls or meetings if this would be helpful for you.

Comment 72 John Ronciak 2009-02-24 18:19:07 UTC
We tried the latest kernel that and the double byte count issue is still there.  Do we need a new BZ for this?

Comment 73 Andy Gospodarek 2009-02-24 19:00:15 UTC
John, another BZ would be great.

Comment 79 Chris Ward 2009-07-03 18:17:06 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 80 Chris Ward 2009-07-10 19:08:26 UTC
~~ Attention Partners - RHEL 5.4 Snapshot 1 Released! ~~

RHEL 5.4 Snapshot 1 has been released on partners.redhat.com. If you have already reported your test results, you can safely ignore this request. Otherwise, please notice that there should be a fix available now that addresses this particular request. Please test and report back your results here, at your earliest convenience. The RHEL 5.4 exception freeze is quickly approaching.

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Do not flip the bug status to VERIFIED. Instead, please set your Partner ID in the Verified field above if you have successfully verified the resolution of this issue. 

Further questions can be directed to your Red Hat Partner Manager or other appropriate customer representative.

Comment 81 Jan Tluka 2009-07-20 14:47:16 UTC
Patch is in -158.el5. SanityOnly.

Comment 82 Abdul Khan 2009-07-30 01:54:00 UTC
I checked with the 5.4 snapshot 3 and I don't see the problem.

Looks like its fixed.

Comment 83 Andrius Benokraitis 2009-08-04 19:15:17 UTC
Jesse @ INTL - can you verify this fixes the issue?

Comment 84 John Ronciak 2009-08-05 23:55:20 UTC
From one of our testers:
It's fixed on RHEL5.4 snap5. Oplin and Niantic on both GreenCity(w/ hyperthreading on) and Dunningten(24 physical cores) passes traffic without issues. 

As before though, we don't get all the MSI-X vectors we request on first driver load. I can dig up email discussion regarding this but I believe it was RHEL kernel issue.

So this problem looks to be fixed.  The MSI-X vector thing still seems to be there but I don't think that could be fixed in RHEL5 for kABI reasons if I remember correctly.

Comment 85 Chris Ward 2009-08-06 08:18:14 UTC
John - Intel, just a reminder to make sure any additional issues you have are filed into Bugzilla's/IssueTrackers so they can be reviewed by our engineering teams. Thanks for testing feedback.

Comment 87 errata-xmlrpc 2009-09-02 08:14:13 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html