Bug 1036429

Summary: Driver e1000e problem: kernel shows 'Hardware Error' but booting other OS fixes the issue
Product: [Fedora] Fedora Reporter: Jaroslaw Gorny <jaroslaw.gorny>
Component: kernelAssignee: fedora-kernel-ethernet
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, michele
Target Milestone: ---Flags: jforbes: needinfo?
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-17 18:45:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
lspci -vvv -s 00:19.0
none
dmesg entries with e1000e debug enabled
none
Errors for e1000e in dmesg
none
ethtool output
none
Dmesg messages for e1000e when hardware is 'fixed' none

Description Jaroslaw Gorny 2013-12-02 00:00:10 UTC
Created attachment 831337 [details]
lspci -vvv -s 00:19.0

Description of problem:

I'm not sure what is triggering the problem, but once in a while, when using Fedora (currently F20, but it has occured to me also in F18), my ethernet stops working. Dmesg / journalctl show there's an 'hardware error'.

Additionally, as turning on debug in kernel for e1000e module shows, there's a problem with acquiring semaphore for that device, which results in high CPU usage by kernel threads like: ksoftirqd/n, watchdog/n, kworker/n:m

Unloading / inserting e1000e module does not solve the issue. Bringing interface down causes the 'high CPU symptom' to disappear, but trying to bring interface up fails. Interface is not being shown by 'ip a s' at all.

Reboot *DOES NOT* help. Booting older kernel does not help as well.
However, the thing that fixes ethernet, is to boot some other OS (I'm using an ArchLinux installer ISO from USB stick). After going back to Fedora, eth is working correctly (interface is up and working, no error messages).


Version-Release number of selected component (if applicable):

3.11.9-300.fc20.x86_64
(but also: kernel-3.11.8-300.fc20.x86_64)

How reproducible:
N/A

Steps to Reproduce:
1. Use your ethernet (e1000e module) normally and wait for hardware errors in dmesg / journalctl, accompanied by high CPU usage by kernel threads.


Additional info:
My ethernet is:
00:19.0 Ethernet controller: Intel Corporation 82566MM Gigabit Network Connection (rev 03)

Comment 1 Jaroslaw Gorny 2013-12-02 00:05:20 UTC
Created attachment 831339 [details]
dmesg entries with e1000e debug enabled

Dmesg is flooded with messages like the one in attachment when debugging for e1000e is enabled.

Comment 2 Jaroslaw Gorny 2013-12-02 00:12:37 UTC
Created attachment 831340 [details]
Errors for e1000e in dmesg

This is what I can find in dmesg when ethernet is 'broken'.

Comment 3 Jaroslaw Gorny 2013-12-02 00:13:25 UTC
Created attachment 831341 [details]
ethtool output

Comment 4 Jaroslaw Gorny 2013-12-02 00:15:13 UTC
Created attachment 831342 [details]
Dmesg messages for e1000e when hardware is 'fixed'

After I boot some other OS (Archlinux in this example), and then boot back into Fedora, no more error messages in dmesg - hardware initializes properly.

Comment 5 Justin M. Forbes 2014-02-24 14:03:46 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.13.4-200.fc20.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 6 Justin M. Forbes 2014-03-17 18:45:06 UTC
*********** MASS BUG UPDATE **************

This bug has been in a needinfo state for several weeks and is being closed with insufficient data due to inactivity. If this is still an issue with Fedora 20, please feel free to reopen the bug and provide the additional information requested.