Bug 482822 - Intel E1000 doesn't work on NVIDIA MCP51 motherboards
Intel E1000 doesn't work on NVIDIA MCP51 motherboards
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.9
All Linux
low Severity medium
: rc
: 4.8
Assigned To: Andy Gospodarek
Red Hat Kernel QE team
:
Depends On:
Blocks: Nvidia4.8
  Show dependency treegraph
 
Reported: 2009-01-28 08:15 EST by Laurent Jean-Rigaud
Modified: 2014-06-29 19:01 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-18 15:35:04 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Dmesg with e1000-msi patch (17.27 KB, application/octet-stream)
2009-02-04 11:57 EST, Laurent Jean-Rigaud
no flags Details
nvidia-fix.patch (4.11 KB, patch)
2009-02-04 17:06 EST, Andy Gospodarek
no flags Details | Diff
e1000-msi-test-and-switch-to-intx.patch (5.34 KB, patch)
2009-02-05 16:12 EST, Andy Gospodarek
no flags Details | Diff

  None (edit)
Description Laurent Jean-Rigaud 2009-01-28 08:15:16 EST
Description of problem:

The e1000 module delivered in RHEL4 (tested with 2.6.9-[57|79|80].EL) seems to not send/receive any packet under FSC ESPRIMO E5615 (NVIDIA MCP51). 

The Intel pcie card is viewed by kudzu, configured and link status is shown correctly thru ethtool/mii-tool.

The associated IRQ are shown as PCI-MSI in /proc/interrups .


Version-Release number of selected component (if applicable):

module e1000 : 7.3.20-k2-NAPI

Intel Gigabit card (2 ports, pcie) :
03:00.0 Class 0200: 8086:105e (rev 06)
03:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
        Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size 10
        Interrupt: pin A routed to IRQ 10
        Region 0: Memory at f2020000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at f2000000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at 9000 [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
                Address: 00000000fee00000  Data: 40b1
        Capabilities: [e0] Express Endpoint IRQ 0
                Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag-
                Device: Latency L0s <512ns, L1 <64us
                Device: AtnBtn- AtnInd- PwrInd-
                Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
                Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
                Link: Supported Speed 2.5Gb/s, Width x4, ASPM L0s, Port 0
                Link: Latency L0s <4us, L1 <64us
                Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
                Link: Speed 2.5Gb/s, Width x4
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 12-c3-51-ff-ff-17-15-00




How reproducible:

Install INTEL card, configure it (static IP or DHCP) and try to use it. 
The DHCP negotiation fails on error after time out (no packet is sent)

  
Actual results:
dmesg log:
Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ACPI: PCI interrupt 0000:03:00.0[A] -> GSI 23 (level, high) -> IRQ 177
PCI: Setting latency timer of device 0000:03:00.0 to 64
e1000: 0000:03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:51:c3:12
divert: allocating divert_blk for eth0
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI interrupt 0000:03:00.1[B] -> GSI 22 (level, high) -> IRQ 185
PCI: Setting latency timer of device 0000:03:00.1 to 64
e1000: 0000:03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:51:c3:13
divert: allocating divert_blk for eth1
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection


Expected results:
see below.

Additional info:

I tryed e1000 7.5.6 (dkms integration) and it runs well on same machine/kernel :
Intel(R) PRO/1000 Network Driver - version 7.6.5-NAPI
Copyright (c) 1999-2007 Intel Corporation.
PCI: Setting latency timer of device 0000:03:00.0 to 64
e1000: 0000:03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:51:c3:12
divert: allocating divert_blk for eth0
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
PCI: Setting latency timer of device 0000:03:00.1 to 64
e1000: 0000:03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:51:c3:13
divert: allocating divert_blk for eth1
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
ip_tables: (C) 2000-2002 Netfilter core team
e1000: eth0: e1000_test_msi_interrupt: MSI interrupt test failed!
e1000: eth0: e1000_test_msi: MSI interrupt test failed, using legacy interrupt.
e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO
ip_tables: (C) 2000-2002 Netfilter core team
e1000: eth1: e1000_test_msi_interrupt: MSI interrupt test failed!
e1000: eth1: e1000_test_msi: MSI interrupt test failed, using legacy interrupt.
e1000: eth1: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth1: e1000_watchdog_task: 10/100 speed: disabling TSO

This one failbacks on legacy interrups (IO-APIC). In fact, i wanted to use pci=nomsi to desactivate MSI interrupts in RHEL4 e1000 but it seems that this kernel options have not been backported from last release in 2.6.9... :-(

Good luck !
Comment 1 Laurent Jean-Rigaud 2009-01-28 08:39:35 EST
It seems that a quick fix should be to blacklist MCP51 in pci quirks to avoid this problem. 

Is it possible to know if this bugzilla will be fix by RedHat in any futur RHEL4 update or my configuration (NVidia chipset + e1000) is not a RHEL4 target (in commercial point of view ;-)) ?

Regards
Comment 2 Laurent Jean-Rigaud 2009-02-02 11:44:46 EST
The fix i talk above do not fix e1000 problem (and introduces a problem with local APIC if hight definition timer is set in bios).
Comment 3 Andy Gospodarek 2009-02-02 11:59:57 EST
Can you try test kernels located here:

http://people.redhat.com/agospoda/#rhel4

These kernels include a patch for e1000 that should detect that MSI interrupts
are not working with e1000 and switch to INTx mode.  This is much better than disabling MSI on the entire system.

The patch mentioned above is also included in the version of the e1000 driver (7.5.6) that is part of dkms, so I suspect this will resolve your issue.
Comment 4 Laurent Jean-Rigaud 2009-02-02 12:33:58 EST
I will try, if i could get the machine again... but the E1000 MSI patch seem to be dropped ?!

$ rpm -qpl ../SRPMS/kernel-2.6.9-80.EL.src.rpm -v | grep -i e1000
-rw-r--r--    1 mockbuilmockbuil        50348 May  4  2005 linux-2.6.10-net-e1000-update.patch
-rw-r--r--    1 mockbuilmockbuil      1248783 Oct  3  2007 linux-2.6.11-net-e1000-update.patch
-rw-r--r--    1 mockbuilmockbuil         4058 Jan 23 21:46 linux-2.6.9-e1000-add-parameter-to-set-transmit-descriptor-size.patch
-rw-r--r--    1 mockbuilmockbuil          953 Mar 26  2008 linux-2.6.9-e1000-disable-pci-e-completion-timeouts-on-pseries.patch
-rw-r--r--    1 mockbuilmockbuil          440 Dec 16 17:17 linux-2.6.9-e1000-remove-e1000_clean_tx_irq-call-from-e1000_net.patch
-rw-r--r--    1 mockbuilmockbuil         2713 Dec 16 17:17 linux-2.6.9-e1000-restart-receive-unit-on-esb2-hardware.patch
-rw-r--r--    1 mockbuilmockbuil        32832 Apr  3  2008 linux-2.6.9-e1000-upstream-update-and-alternate-mac-address-sup.patch
-rw-r--r--    1 mockbuilmockbuil         1864 Jan 23 21:46 linux-2.6.9-e1000e-add-reboot-notifier-so-wol-will-work.patch
-rw-r--r--    1 mockbuilmockbuil       202142 Mar 26  2008 linux-2.6.9-e1000e-update-to-latest-upstream.patch
-rw-r--r--    1 mockbuilmockbuil       380792 Jan 14 16:41 linux-2.6.9-e1000e-update-to-upstream-version-0.3.3.3-k6.patch
-rw-r--r--    1 mockbuilmockbuil         1758 Jan 16 23:03 linux-2.6.9-enable-entropy-generation-from-e1000-and-bnx2-networ.patch
-rw-r--r--    1 mockbuilmockbuil         1233 Dec 11  2004 linux-2.6.9-net-e1000-64k-align-check-dma.patch
-rw-r--r--    1 mockbuilmockbuil         7611 Apr 30  2005 linux-2.6.9-net-e1000-avoid-sleep-in-timer-context.patch
-rw-r--r--    1 mockbuilmockbuil        14468 Dec  8  2004 linux-2.6.9-net-e1000-erratum23.patch
-rw-r--r--    1 mockbuilmockbuil          605 Mar 22  2005 linux-2.6.9-net-e1000-flush-rmmod.patch
-rw-r--r--    1 mockbuilmockbuil         3901 Dec  8  2004 linux-2.6.9-net-e1000-post-mature-writeback.patch
-rw-r--r--    1 mockbuilmockbuil         1114 Dec  2  2004 linux-2.6.9-net-e1000-rx-mini-jumbo-inval.patch
-rw-r--r--    1 mockbuilmockbuil       533745 Sep 26  2007 linux-2.6.9-net-e1000e.patch
$ rpm -qpl ../SRPMS/kernel-2.6.9-80.EL.gtest.57.src.rpm -v | grep -i e1000
-rw-r--r--    1 mockbuilmockbuil        50348 May  4  2005 linux-2.6.10-net-e1000-update.patch
-rw-r--r--    1 mockbuilmockbuil      1248783 Oct  3  2007 linux-2.6.11-net-e1000-update.patch
-rw-r--r--    1 mockbuilmockbuil         4058 Jan 23 21:46 linux-2.6.9-e1000-add-parameter-to-set-transmit-descriptor-size.patch
-rw-r--r--    1 mockbuilmockbuil          953 Mar 26  2008 linux-2.6.9-e1000-disable-pci-e-completion-timeouts-on-pseries.patch
-rw-r--r--    1 mockbuilmockbuil          440 Dec 16 17:17 linux-2.6.9-e1000-remove-e1000_clean_tx_irq-call-from-e1000_net.patch
-rw-r--r--    1 mockbuilmockbuil         2713 Dec 16 17:17 linux-2.6.9-e1000-restart-receive-unit-on-esb2-hardware.patch
-rw-r--r--    1 mockbuilmockbuil        32832 Apr  3  2008 linux-2.6.9-e1000-upstream-update-and-alternate-mac-address-sup.patch
-rw-r--r--    1 mockbuilmockbuil         1864 Jan 23 21:46 linux-2.6.9-e1000e-add-reboot-notifier-so-wol-will-work.patch
-rw-r--r--    1 mockbuilmockbuil       202142 Mar 26  2008 linux-2.6.9-e1000e-update-to-latest-upstream.patch
-rw-r--r--    1 mockbuilmockbuil       380792 Jan 14 16:41 linux-2.6.9-e1000e-update-to-upstream-version-0.3.3.3-k6.patch
-rw-r--r--    1 mockbuilmockbuil         1758 Jan 16 23:03 linux-2.6.9-enable-entropy-generation-from-e1000-and-bnx2-networ.patch
-rw-r--r--    1 mockbuilmockbuil         1233 Dec 11  2004 linux-2.6.9-net-e1000-64k-align-check-dma.patch
-rw-r--r--    1 mockbuilmockbuil         7611 Apr 30  2005 linux-2.6.9-net-e1000-avoid-sleep-in-timer-context.patch
-rw-r--r--    1 mockbuilmockbuil        14468 Dec  8  2004 linux-2.6.9-net-e1000-erratum23.patch
-rw-r--r--    1 mockbuilmockbuil          605 Mar 22  2005 linux-2.6.9-net-e1000-flush-rmmod.patch
-rw-r--r--    1 mockbuilmockbuil         3901 Dec  8  2004 linux-2.6.9-net-e1000-post-mature-writeback.patch
-rw-r--r--    1 mockbuilmockbuil         1114 Dec  2  2004 linux-2.6.9-net-e1000-rx-mini-jumbo-inval.patch
-rw-r--r--    1 mockbuilmockbuil       533745 Sep 26  2007 linux-2.6.9-net-e1000e.patch
$ 

e1000 patches are identical between 80 and 80.gtest .
Same for *msi* patches...

Can you confirm the e1000 msi patch inclusion ?
Comment 5 Andy Gospodarek 2009-02-02 14:18:29 EST
Well my test kernels use linux-kernel-test.patch for holding all of my experimental patches, so even if you compared the files searching for 'e1000' you might not find any differences using your methods.

For the record, it looks like I dropped this patch:

http://people.redhat.com/agospoda/rhel4/0005-e1000-msi-test-and-switch-to-intx.patch

from my test kernels.  I can add that back or you can try it manually if you like.
Comment 6 Laurent Jean-Rigaud 2009-02-03 10:25:44 EST
OK. I've take 2.6.9-80.EL and add 0005-e1000-msi* patch. Rebuilding and need to retrieve a NVidia PC for test.
Comment 7 Laurent Jean-Rigaud 2009-02-04 11:57:30 EST
Created attachment 330887 [details]
Dmesg with e1000-msi patch

With patch, e1000 module fallbacks to legagy interrups but no traffic at all (stats @ 0, tcpdump empty). 

module version 7.3.20-k3-NAPI

e1000 binds eth0 & eth1 (disabled). forcedeth binds eth2.

/proc/interrups :
           CPU0       
  0:     388428    IO-APIC-edge  timer
  1:       2337    IO-APIC-edge  i8042
  7:          3    IO-APIC-edge  parport0
  8:          1    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:       2355    IO-APIC-edge  i8042
 15:       3054    IO-APIC-edge  ide1
177:          0   IO-APIC-level  eth0
185:          0   IO-APIC-level  libata
193:        235   IO-APIC-level  HDA Intel, ohci_hcd
201:      35190   IO-APIC-level  ehci_hcd, eth2
209:       7823   IO-APIC-level  libata
NMI:          0 
LOC:     388274 
ERR:          0
MIS:          0
Comment 8 Laurent Jean-Rigaud 2009-02-04 11:59:42 EST
to resume:
- 2.6.9-80.EL (7.3.20-k2-NAPI) : nok
- 2.6.9-80.EL + e1000-patch (7.3.20-k3-NAPI) : nok
- 2.6.9-80.EL + e1000-dkms (7.5.6-NAPI) : ok (legagy IRQ)

Regards
Comment 9 Andy Gospodarek 2009-02-04 17:06:09 EST
Created attachment 330927 [details]
nvidia-fix.patch

Laurent, thanks for the feedback.  I have one more patch that might be interesting to try.  This is an attempt to address some MSI/HT issues that have become apparent lately.  I haven't tested this patch at all since I don't have an offending system, but I think it will be OK.
Comment 10 Laurent Jean-Rigaud 2009-02-05 08:33:05 EST
This patch makes compilation failed on error :
 drivers/pci/quirks.c: In function `quirk_find_ht_capability':
 drivers/pci/quirks.c:1713: error: 'pos' redeclared as different kind of symbol
 drivers/pci/quirks.c:1711: error: previous definition of 'pos' was here
 drivers/pci/quirks.c:1730: warning: passing arg 3 of `pci_read_config_byte' from incompatible pointer type
 drivers/pci/quirks.c: In function `nv_msi_ht_cap_quirk':
 drivers/pci/quirks.c:1746: warning: implicit declaration of function `pci_get_bus_and_slot'
 drivers/pci/quirks.c:1746: warning: assignment makes pointer from integer without a cast
 make[2]: *** [drivers/pci/quirks.o] Error 1
 make[1]: *** [drivers/pci] Error 2
 make[1]: *** Waiting for unfinished jobs....
 make: *** [drivers] Error 2
 error: Bad exit status from /home/buildsys/rpmbuild/tmp/rpm-tmp.55710 (%build)

It seems redeclaration of pos is not very good....
+static int __devinit quirk_find_ht_capability(struct pci_dev *dev, int pos, int ht_cap)
+{
+	u8 pos;

Removing the redefinition, the function pci_get_bus_and slot is undefined !
 ../..
  CHK     include/linux/compile.h
  UPD     include/linux/compile.h
 drivers/built-in.o(.text+0x4540): In function `nv_msi_ht_cap_quirk':
 drivers/pci/quirks.c:1746: undefined reference to `pci_get_bus_and_slot'


By the way, as the dkms version of e1000 runs well, a patch againt e1000 sources should be suffisant. The actual e1000-msi patch should miss something ;-)
Comment 11 Andy Gospodarek 2009-02-05 16:12:14 EST
Created attachment 331054 [details]
e1000-msi-test-and-switch-to-intx.patch

I think I found the problem with my original patch.  Please replace the previous 'e1000-msi' patch with this one and forget that non-compiling patch I uploaded yesterday. :-)
Comment 12 Laurent Jean-Rigaud 2009-02-06 06:01:02 EST
Ok, it's better now with last patch (+ 'adapter->have_msi = 1;')

e1000: no version for "struct_module" found: kernel tainted.
Intel(R) PRO/1000 Network Driver - version 7.3.20-k3-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ACPI: PCI interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 177
PCI: Setting latency timer of device 0000:02:00.0 to 64
e1000: 0000:02:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:15:17:24:45:86
divert: allocating divert_blk for eth0
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI interrupt 0000:02:00.1[B] -> GSI 16 (level, low) -> IRQ 177
PCI: Setting latency timer of device 0000:02:00.1 to 64
e1000: 0000:02:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:15:17:24:45:87
divert: allocating divert_blk for eth1
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
ip_tables: (C) 2000-2002 Netfilter core team
e1000: eth1: e1000_test_msi: MSI interrupt test failed, using legacy interrupt.
ip_tables: (C) 2000-2002 Netfilter core team
e1000: eth0: e1000_test_msi: MSI interrupt test failed, using legacy interrupt.
e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO
device eth0 entered promiscuous mode
device eth0 left promiscuous mode
ip_tables: (C) 2000-2002 Netfilter core team
e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO
e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO

And eth0 can retrieve its dynanic address. I will try static speed and bonding.

smell Good !

Regards
Comment 13 Andy Gospodarek 2009-02-06 08:46:48 EST
Excellent!

I noticed that my patch was missing one line that allowed the previously requested interrupt to be correctly disabled, so I was convinced this would fix the problem.  I will work to get this added to the upcoming update.
Comment 15 RHEL Product and Program Management 2009-02-06 09:22:14 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 16 Laurent Jean-Rigaud 2009-02-06 09:27:01 EST
also, Bonding and static negotiation both works.

Thanks
Comment 19 Russell Doty 2009-02-24 09:22:39 EST
Requested exception and added to tracker BZ.
Comment 20 Vivek Goyal 2009-02-26 11:47:49 EST
Committed in 82.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 22 Chris Ward 2009-03-27 10:20:45 EDT
~~ Attention Partners! Snap 1 Released ~~
RHEL 4.8 Snapshot 1 has been released on partners.redhat.com. There should
be a fix present, which addresses this bug. NOTE: there is only a short time
left to test, please test and report back results on this bug
at your earliest convenience.

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have found a NEW bug, clone this
bug and describe the issues you encountered. Further questions can be
directed to your Red Hat Partner Manager.

If you have VERIFIED the bug fix. Please select your PartnerID from the
Verified field above. Please leave a comment with your test results details.
Include which arches tested, package version and any applicable logs.

 - Red Hat QE Partner Management
Comment 23 Chris Ward 2009-04-16 12:08:18 EDT
~~ Attention! Snap 4 Released ~~
RHEL 4.8 Snapshot 4 has been released on partners.redhat.com. There should
be a fix present that addresses this bug. NOTE: there is only a short time
left to test, please test and report back results on this bug ASAP.

The latest kernel build can be obtained here:
http://people.redhat.com/vgoyal/rhel4/

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have found a NEW bug, clone this
bug and describe the issues you encountered. Further questions can be
directed to your Red Hat Partner Manager.

If you have VERIFIED the bug fix. Please select your PartnerID from the
Verified field above. Please leave a comment with your test results details.
Include which arches tested, package version and any applicable logs.
Comment 26 errata-xmlrpc 2009-05-18 15:35:04 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html

Note You need to log in before you can comment on or make changes to this bug.