Description of problem: The e1000 module delivered in RHEL4 (tested with 2.6.9-[57|79|80].EL) seems to not send/receive any packet under FSC ESPRIMO E5615 (NVIDIA MCP51). The Intel pcie card is viewed by kudzu, configured and link status is shown correctly thru ethtool/mii-tool. The associated IRQ are shown as PCI-MSI in /proc/interrups . Version-Release number of selected component (if applicable): module e1000 : 7.3.20-k2-NAPI Intel Gigabit card (2 ports, pcie) : 03:00.0 Class 0200: 8086:105e (rev 06) 03:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size 10 Interrupt: pin A routed to IRQ 10 Region 0: Memory at f2020000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f2000000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at 9000 [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 00000000fee00000 Data: 40b1 Capabilities: [e0] Express Endpoint IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <512ns, L1 <64us Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ Device: MaxPayload 128 bytes, MaxReadReq 512 bytes Link: Supported Speed 2.5Gb/s, Width x4, ASPM L0s, Port 0 Link: Latency L0s <4us, L1 <64us Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x4 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 12-c3-51-ff-ff-17-15-00 How reproducible: Install INTEL card, configure it (static IP or DHCP) and try to use it. The DHCP negotiation fails on error after time out (no packet is sent) Actual results: dmesg log: Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI Copyright (c) 1999-2006 Intel Corporation. ACPI: PCI interrupt 0000:03:00.0[A] -> GSI 23 (level, high) -> IRQ 177 PCI: Setting latency timer of device 0000:03:00.0 to 64 e1000: 0000:03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:51:c3:12 divert: allocating divert_blk for eth0 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI interrupt 0000:03:00.1[B] -> GSI 22 (level, high) -> IRQ 185 PCI: Setting latency timer of device 0000:03:00.1 to 64 e1000: 0000:03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:51:c3:13 divert: allocating divert_blk for eth1 e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection Expected results: see below. Additional info: I tryed e1000 7.5.6 (dkms integration) and it runs well on same machine/kernel : Intel(R) PRO/1000 Network Driver - version 7.6.5-NAPI Copyright (c) 1999-2007 Intel Corporation. PCI: Setting latency timer of device 0000:03:00.0 to 64 e1000: 0000:03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:51:c3:12 divert: allocating divert_blk for eth0 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection PCI: Setting latency timer of device 0000:03:00.1 to 64 e1000: 0000:03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:51:c3:13 divert: allocating divert_blk for eth1 e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection ip_tables: (C) 2000-2002 Netfilter core team e1000: eth0: e1000_test_msi_interrupt: MSI interrupt test failed! e1000: eth0: e1000_test_msi: MSI interrupt test failed, using legacy interrupt. e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO ip_tables: (C) 2000-2002 Netfilter core team e1000: eth1: e1000_test_msi_interrupt: MSI interrupt test failed! e1000: eth1: e1000_test_msi: MSI interrupt test failed, using legacy interrupt. e1000: eth1: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX e1000: eth1: e1000_watchdog_task: 10/100 speed: disabling TSO This one failbacks on legacy interrups (IO-APIC). In fact, i wanted to use pci=nomsi to desactivate MSI interrupts in RHEL4 e1000 but it seems that this kernel options have not been backported from last release in 2.6.9... :-( Good luck !
It seems that a quick fix should be to blacklist MCP51 in pci quirks to avoid this problem. Is it possible to know if this bugzilla will be fix by RedHat in any futur RHEL4 update or my configuration (NVidia chipset + e1000) is not a RHEL4 target (in commercial point of view ;-)) ? Regards
The fix i talk above do not fix e1000 problem (and introduces a problem with local APIC if hight definition timer is set in bios).
Can you try test kernels located here: http://people.redhat.com/agospoda/#rhel4 These kernels include a patch for e1000 that should detect that MSI interrupts are not working with e1000 and switch to INTx mode. This is much better than disabling MSI on the entire system. The patch mentioned above is also included in the version of the e1000 driver (7.5.6) that is part of dkms, so I suspect this will resolve your issue.
I will try, if i could get the machine again... but the E1000 MSI patch seem to be dropped ?! $ rpm -qpl ../SRPMS/kernel-2.6.9-80.EL.src.rpm -v | grep -i e1000 -rw-r--r-- 1 mockbuilmockbuil 50348 May 4 2005 linux-2.6.10-net-e1000-update.patch -rw-r--r-- 1 mockbuilmockbuil 1248783 Oct 3 2007 linux-2.6.11-net-e1000-update.patch -rw-r--r-- 1 mockbuilmockbuil 4058 Jan 23 21:46 linux-2.6.9-e1000-add-parameter-to-set-transmit-descriptor-size.patch -rw-r--r-- 1 mockbuilmockbuil 953 Mar 26 2008 linux-2.6.9-e1000-disable-pci-e-completion-timeouts-on-pseries.patch -rw-r--r-- 1 mockbuilmockbuil 440 Dec 16 17:17 linux-2.6.9-e1000-remove-e1000_clean_tx_irq-call-from-e1000_net.patch -rw-r--r-- 1 mockbuilmockbuil 2713 Dec 16 17:17 linux-2.6.9-e1000-restart-receive-unit-on-esb2-hardware.patch -rw-r--r-- 1 mockbuilmockbuil 32832 Apr 3 2008 linux-2.6.9-e1000-upstream-update-and-alternate-mac-address-sup.patch -rw-r--r-- 1 mockbuilmockbuil 1864 Jan 23 21:46 linux-2.6.9-e1000e-add-reboot-notifier-so-wol-will-work.patch -rw-r--r-- 1 mockbuilmockbuil 202142 Mar 26 2008 linux-2.6.9-e1000e-update-to-latest-upstream.patch -rw-r--r-- 1 mockbuilmockbuil 380792 Jan 14 16:41 linux-2.6.9-e1000e-update-to-upstream-version-0.3.3.3-k6.patch -rw-r--r-- 1 mockbuilmockbuil 1758 Jan 16 23:03 linux-2.6.9-enable-entropy-generation-from-e1000-and-bnx2-networ.patch -rw-r--r-- 1 mockbuilmockbuil 1233 Dec 11 2004 linux-2.6.9-net-e1000-64k-align-check-dma.patch -rw-r--r-- 1 mockbuilmockbuil 7611 Apr 30 2005 linux-2.6.9-net-e1000-avoid-sleep-in-timer-context.patch -rw-r--r-- 1 mockbuilmockbuil 14468 Dec 8 2004 linux-2.6.9-net-e1000-erratum23.patch -rw-r--r-- 1 mockbuilmockbuil 605 Mar 22 2005 linux-2.6.9-net-e1000-flush-rmmod.patch -rw-r--r-- 1 mockbuilmockbuil 3901 Dec 8 2004 linux-2.6.9-net-e1000-post-mature-writeback.patch -rw-r--r-- 1 mockbuilmockbuil 1114 Dec 2 2004 linux-2.6.9-net-e1000-rx-mini-jumbo-inval.patch -rw-r--r-- 1 mockbuilmockbuil 533745 Sep 26 2007 linux-2.6.9-net-e1000e.patch $ rpm -qpl ../SRPMS/kernel-2.6.9-80.EL.gtest.57.src.rpm -v | grep -i e1000 -rw-r--r-- 1 mockbuilmockbuil 50348 May 4 2005 linux-2.6.10-net-e1000-update.patch -rw-r--r-- 1 mockbuilmockbuil 1248783 Oct 3 2007 linux-2.6.11-net-e1000-update.patch -rw-r--r-- 1 mockbuilmockbuil 4058 Jan 23 21:46 linux-2.6.9-e1000-add-parameter-to-set-transmit-descriptor-size.patch -rw-r--r-- 1 mockbuilmockbuil 953 Mar 26 2008 linux-2.6.9-e1000-disable-pci-e-completion-timeouts-on-pseries.patch -rw-r--r-- 1 mockbuilmockbuil 440 Dec 16 17:17 linux-2.6.9-e1000-remove-e1000_clean_tx_irq-call-from-e1000_net.patch -rw-r--r-- 1 mockbuilmockbuil 2713 Dec 16 17:17 linux-2.6.9-e1000-restart-receive-unit-on-esb2-hardware.patch -rw-r--r-- 1 mockbuilmockbuil 32832 Apr 3 2008 linux-2.6.9-e1000-upstream-update-and-alternate-mac-address-sup.patch -rw-r--r-- 1 mockbuilmockbuil 1864 Jan 23 21:46 linux-2.6.9-e1000e-add-reboot-notifier-so-wol-will-work.patch -rw-r--r-- 1 mockbuilmockbuil 202142 Mar 26 2008 linux-2.6.9-e1000e-update-to-latest-upstream.patch -rw-r--r-- 1 mockbuilmockbuil 380792 Jan 14 16:41 linux-2.6.9-e1000e-update-to-upstream-version-0.3.3.3-k6.patch -rw-r--r-- 1 mockbuilmockbuil 1758 Jan 16 23:03 linux-2.6.9-enable-entropy-generation-from-e1000-and-bnx2-networ.patch -rw-r--r-- 1 mockbuilmockbuil 1233 Dec 11 2004 linux-2.6.9-net-e1000-64k-align-check-dma.patch -rw-r--r-- 1 mockbuilmockbuil 7611 Apr 30 2005 linux-2.6.9-net-e1000-avoid-sleep-in-timer-context.patch -rw-r--r-- 1 mockbuilmockbuil 14468 Dec 8 2004 linux-2.6.9-net-e1000-erratum23.patch -rw-r--r-- 1 mockbuilmockbuil 605 Mar 22 2005 linux-2.6.9-net-e1000-flush-rmmod.patch -rw-r--r-- 1 mockbuilmockbuil 3901 Dec 8 2004 linux-2.6.9-net-e1000-post-mature-writeback.patch -rw-r--r-- 1 mockbuilmockbuil 1114 Dec 2 2004 linux-2.6.9-net-e1000-rx-mini-jumbo-inval.patch -rw-r--r-- 1 mockbuilmockbuil 533745 Sep 26 2007 linux-2.6.9-net-e1000e.patch $ e1000 patches are identical between 80 and 80.gtest . Same for *msi* patches... Can you confirm the e1000 msi patch inclusion ?
Well my test kernels use linux-kernel-test.patch for holding all of my experimental patches, so even if you compared the files searching for 'e1000' you might not find any differences using your methods. For the record, it looks like I dropped this patch: http://people.redhat.com/agospoda/rhel4/0005-e1000-msi-test-and-switch-to-intx.patch from my test kernels. I can add that back or you can try it manually if you like.
OK. I've take 2.6.9-80.EL and add 0005-e1000-msi* patch. Rebuilding and need to retrieve a NVidia PC for test.
Created attachment 330887 [details] Dmesg with e1000-msi patch With patch, e1000 module fallbacks to legagy interrups but no traffic at all (stats @ 0, tcpdump empty). module version 7.3.20-k3-NAPI e1000 binds eth0 & eth1 (disabled). forcedeth binds eth2. /proc/interrups : CPU0 0: 388428 IO-APIC-edge timer 1: 2337 IO-APIC-edge i8042 7: 3 IO-APIC-edge parport0 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 2355 IO-APIC-edge i8042 15: 3054 IO-APIC-edge ide1 177: 0 IO-APIC-level eth0 185: 0 IO-APIC-level libata 193: 235 IO-APIC-level HDA Intel, ohci_hcd 201: 35190 IO-APIC-level ehci_hcd, eth2 209: 7823 IO-APIC-level libata NMI: 0 LOC: 388274 ERR: 0 MIS: 0
to resume: - 2.6.9-80.EL (7.3.20-k2-NAPI) : nok - 2.6.9-80.EL + e1000-patch (7.3.20-k3-NAPI) : nok - 2.6.9-80.EL + e1000-dkms (7.5.6-NAPI) : ok (legagy IRQ) Regards
Created attachment 330927 [details] nvidia-fix.patch Laurent, thanks for the feedback. I have one more patch that might be interesting to try. This is an attempt to address some MSI/HT issues that have become apparent lately. I haven't tested this patch at all since I don't have an offending system, but I think it will be OK.
This patch makes compilation failed on error : drivers/pci/quirks.c: In function `quirk_find_ht_capability': drivers/pci/quirks.c:1713: error: 'pos' redeclared as different kind of symbol drivers/pci/quirks.c:1711: error: previous definition of 'pos' was here drivers/pci/quirks.c:1730: warning: passing arg 3 of `pci_read_config_byte' from incompatible pointer type drivers/pci/quirks.c: In function `nv_msi_ht_cap_quirk': drivers/pci/quirks.c:1746: warning: implicit declaration of function `pci_get_bus_and_slot' drivers/pci/quirks.c:1746: warning: assignment makes pointer from integer without a cast make[2]: *** [drivers/pci/quirks.o] Error 1 make[1]: *** [drivers/pci] Error 2 make[1]: *** Waiting for unfinished jobs.... make: *** [drivers] Error 2 error: Bad exit status from /home/buildsys/rpmbuild/tmp/rpm-tmp.55710 (%build) It seems redeclaration of pos is not very good.... +static int __devinit quirk_find_ht_capability(struct pci_dev *dev, int pos, int ht_cap) +{ + u8 pos; Removing the redefinition, the function pci_get_bus_and slot is undefined ! ../.. CHK include/linux/compile.h UPD include/linux/compile.h drivers/built-in.o(.text+0x4540): In function `nv_msi_ht_cap_quirk': drivers/pci/quirks.c:1746: undefined reference to `pci_get_bus_and_slot' By the way, as the dkms version of e1000 runs well, a patch againt e1000 sources should be suffisant. The actual e1000-msi patch should miss something ;-)
Created attachment 331054 [details] e1000-msi-test-and-switch-to-intx.patch I think I found the problem with my original patch. Please replace the previous 'e1000-msi' patch with this one and forget that non-compiling patch I uploaded yesterday. :-)
Ok, it's better now with last patch (+ 'adapter->have_msi = 1;') e1000: no version for "struct_module" found: kernel tainted. Intel(R) PRO/1000 Network Driver - version 7.3.20-k3-NAPI Copyright (c) 1999-2006 Intel Corporation. ACPI: PCI interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 177 PCI: Setting latency timer of device 0000:02:00.0 to 64 e1000: 0000:02:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:15:17:24:45:86 divert: allocating divert_blk for eth0 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI interrupt 0000:02:00.1[B] -> GSI 16 (level, low) -> IRQ 177 PCI: Setting latency timer of device 0000:02:00.1 to 64 e1000: 0000:02:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:15:17:24:45:87 divert: allocating divert_blk for eth1 e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection ip_tables: (C) 2000-2002 Netfilter core team e1000: eth1: e1000_test_msi: MSI interrupt test failed, using legacy interrupt. ip_tables: (C) 2000-2002 Netfilter core team e1000: eth0: e1000_test_msi: MSI interrupt test failed, using legacy interrupt. e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO device eth0 entered promiscuous mode device eth0 left promiscuous mode ip_tables: (C) 2000-2002 Netfilter core team e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO And eth0 can retrieve its dynanic address. I will try static speed and bonding. smell Good ! Regards
Excellent! I noticed that my patch was missing one line that allowed the previously requested interrupt to be correctly disabled, so I was convinced this would fix the problem. I will work to get this added to the upcoming update.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
also, Bonding and static negotiation both works. Thanks
Requested exception and added to tracker BZ.
Committed in 82.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
~~ Attention Partners! Snap 1 Released ~~ RHEL 4.8 Snapshot 1 has been released on partners.redhat.com. There should be a fix present, which addresses this bug. NOTE: there is only a short time left to test, please test and report back results on this bug at your earliest convenience. If you encounter any issues, please set the bug back to the ASSIGNED state and describe the issues you encountered. If you have found a NEW bug, clone this bug and describe the issues you encountered. Further questions can be directed to your Red Hat Partner Manager. If you have VERIFIED the bug fix. Please select your PartnerID from the Verified field above. Please leave a comment with your test results details. Include which arches tested, package version and any applicable logs. - Red Hat QE Partner Management
~~ Attention! Snap 4 Released ~~ RHEL 4.8 Snapshot 4 has been released on partners.redhat.com. There should be a fix present that addresses this bug. NOTE: there is only a short time left to test, please test and report back results on this bug ASAP. The latest kernel build can be obtained here: http://people.redhat.com/vgoyal/rhel4/ If you encounter any issues, please set the bug back to the ASSIGNED state and describe the issues you encountered. If you have found a NEW bug, clone this bug and describe the issues you encountered. Further questions can be directed to your Red Hat Partner Manager. If you have VERIFIED the bug fix. Please select your PartnerID from the Verified field above. Please leave a comment with your test results details. Include which arches tested, package version and any applicable logs.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html