Bug 165676
Summary: | e1000 driver with Intel 82546EB controller drops packets | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Jon. Hallett <jjh> | ||||
Component: | kernel | Assignee: | John W. Linville <linville> | ||||
Status: | CLOSED CANTFIX | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0 | CC: | bjoern, petrides | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2005-09-26 16:20:33 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jon. Hallett
2005-08-11 11:00:22 UTC
Please try the test kernels available here: http://people.redhat.com/linville/kernels/rhel3/ http://people.redhat.com/linville/kernels/rhel4/ Those both have e1000 drivers based on version 6.0.54-k2. Please try to recreate the issue described above with these kernels and post the results here...thanks! Our RHEL4 boxes exhibit the same problem with the RPC test when running the test kernel. [root@moorhen tmp]# uname -a Linux moorhen.ecs.soton.ac.uk 2.6.9-15.2.EL.jwltest.49smp #1 SMP Mon Aug 15 16:21:22 EDT 2005 i686 i686 i386 GNU/Linux 117: program 100003 version 2 ready and waiting program 100003 version 3 ready and waiting 118: program 100003 version 2 ready and waiting program 100003 version 3 ready and waiting 119: program 100003 version 2 ready and waiting program 100003 version 3 ready and waiting 120: program 100003 version 2 ready and waiting program 100003 version 3 ready and waiting 121: rpcinfo: RPC: Port mapper failure - RPC: Timed out program 100003 version 2 is not available program 100003 version 3 ready and waiting Please post the output of running "ethtool -S" for the appropriate interface after conducting your RPC test and experiencing the failures...thanks! Sorry, but we have now replaced all our 82546EB interfaces and so are no longer able to do tests. CANTFIX, based on lack of available testing. Created attachment 147059 [details]
uname -a; ethtool eth0; lspci -vvv | grep -A15 Ethernet
We've seen something very similar, on our Dell PowerEdge 1855 Blade servers. Output from lspci and ethtool attached. After upgrading from 2.4.21-40.EL to kernel-smp-2.4.21-47.0.1.EL, we experienced strange network problems. It's a bit tricky to investigate, as the problem comes in bursts lasting a minute or five, and, as the machines are placed on an offsite location, often has gone away before we reach as far as the console. As the machines are in heavy production, it's not very tempting to reboot the servers with the newer kernel again. The problem looks more or less like described above. The blades looses packets, effectivily going off net for some minutes while under more or less heavy network load. Rolling back to 2.4.21-40.EL, the problem went away. Ingvar Comment on attachment 147059 [details]
uname -a; ethtool eth0; lspci -vvv | grep -A15 Ethernet
Linux some.where.com 2.4.21-40.EL #1 Thu Feb 2 22:32:00 EST 2006 i686 i686 i386
GNU/Linux
Settings for eth0:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
Supports Wake-on: umbg
Wake-on: d
Current message level: 0x00000007 (7)
Link detected: yes
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 72 (4250ns min, 4500ns max), cache line size 10
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at ec00 [size=256]
Region 1: Memory at dfdf0000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at dfde0000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at dfe00000 [disabled] [size=1M]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [68] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=4
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple,
DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
05:04.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet
Controller (rev 03)
Subsystem: Dell: Unknown device 018a
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (63750ns min), cache line size 10
Interrupt: pin A routed to IRQ 15
Region 0: Memory at dfbe0000 (64-bit, non-prefetchable) [size=128K]
Region 4: I/O ports at dcc0 [size=64]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] PCI-X non-bridge device.
Command: DPERE- ERO+ RBC=0 OST=0
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple,
DMMRBC=0, DMOST=0, DMCRS=0, RSCEM- Capabilities: [f0] Message Signalled
Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
05:04.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet
Controller (rev 03)
Subsystem: Dell: Unknown device 018a
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (63750ns min), cache line size 10
Interrupt: pin B routed to IRQ 7
Region 0: Memory at dfbc0000 (64-bit, non-prefetchable) [size=128K]
Region 4: I/O ports at dc80 [size=64]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] PCI-X non-bridge device.
Command: DPERE- ERO+ RBC=0 OST=0
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple,
DMMRBC=0, DMOST=0, DMCRS=0, RSCEM- Capabilities: [f0] Message Signalled
Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Ingvar, given that you are using RHEL3 I have to suggest that you use the normal RHEL support channels in order to get this issue resolved to your benefit. That will ensure that the issue you are experiencing receives the appropriate level of attention and support. might be this is the same as he following Intel card bug: https://bugzilla.kernel.org/show_bug.cgi?id=15384 The only way I see to fix it is to blacklist all the E1000 adapters with the broken firmware. A temporary workaround is to disable RX checksum offloading via ethtool. I meant disable TX checksum offloading... |