Bug 700793

Summary: Intel 80003ES2LAN network card (e1000e driver) partially loses connection
Product: [Fedora] Fedora Reporter: Mikkel Lauritsen <renard>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-30 17:03:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Mikkel Lauritsen 2011-04-29 12:13:18 UTC
After running for a while the machine in question partially loses its network connectivity. By "partially" I mean that I can still ping it, so it responds
to ICMP packages, but no other network services (SSH, HTTPD etc.) can be reached from the outside.

When it happens there are no error messages to be found in the log files and ifconfig output doesn't show any errors. If I do something from the console which creates outgoing traffic on the network (for example pinging an IP address) it causes the connection to be activated again.

The time the network stays up seems to fluctuate wildly. In some cases it works fine for a week, and today I've had the connection disappear four times already.
No, make that five.

I have found countless other error reports around the 'net concerning the e1000e driver but none that match my situation exactly - most other people seem to be completely unable to make a network connection, but that's not the case for me.


Additional info:

uname -a:
Linux testserver 2.6.35.12-90.fc14.x86_64 #1 SMP Fri Apr 22 16:01:29 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

lspci -vv:

05:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
        Subsystem: Intel Corporation Device 3484
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 70
        Region 0: Memory at b8820000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at b8400000 (32-bit, non-prefetchable) [size=4M]
        Region 2: I/O ports at 2020 [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000feeff00c  Data: 41c1
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM unknown, Latency L0 <128ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140 v1] Device Serial Number 00-15-17-ff-ff-89-2a-6c
        Kernel driver in use: e1000e
        Kernel modules: e1000e

Comment 1 Mikkel Lauritsen 2011-05-03 11:17:52 UTC
FWIW as an attempt at investigating further I've upgraded to F15 beta, and that didn't make any difference - the machine is still losing its network connection.

As mentioned above I can reopen the connection by for example pinging another
IP address. When I do that the first 5-6 ICMP packages are reported as lost, so it apparently takes a short while to restart whatever it is that has become wedged.

Comment 2 Josh Boyer 2011-08-29 14:26:08 UTC
Is this still happening on the latest f15 kernel?