Bug 1107607 - Reports on "hardware error" and spurious crashes
Summary: Reports on "hardware error" and spurious crashes
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-10 10:54 UTC by Göran Uddeborg
Modified: 2014-12-11 20:10 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-12-10 14:58:23 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Output from: journalctl -b | grep 'Linux version\|mce\|Hardware' (6.00 KB, text/plain)
2014-06-10 10:54 UTC, Göran Uddeborg
no flags Details

Description Göran Uddeborg 2014-06-10 10:54:21 UTC
Created attachment 907152 [details]
Output from: journalctl -b | grep 'Linux version\|mce\|Hardware'

Description of problem:
This is probably the same as bug 1047061.  I thought it was fixed.  But since I'm not sure it is the same, I'm creating a new bugzilla rather than reopening the old one.

The messages I see is "mce: [Hardware Error]: Machine check events logged".  The machine has also crashed and become completely unresponsive.  I assume this is related.  If I load the module "edac_mce_amd" as instructed by "mcelog" I get more verbose messages.  See the attachment for details.  I don't know what to do with them.

I first started to see problems like these when I upgraded the kernel and tools around it to F20.  See bug 1047061 for the details.  When I tried kernel 3.13.10-200.fc20.x86_64 in May the problems disappeared, and I thought something had been fixed.

Then, on upgrade to 3.14.4-200.fc20.x86_64 in June, the messages reappeared, and I had a crash a day after the boot.  Today I booted 3.13.10-200.fc20.x86_64 again, but now I get the messages even with this kernel that worked last time.  I upgraded to 3.14.5-200.fc20.x86_64 and rebooted again, still get the messages.

The user space on the machine is not 100% F20, but I do believe everything near the kernel is.

Version-Release number of selected component (if applicable):
kernel-3.14.5-200.fc20.x86_64
linux-firmware-20140317-37.gitdec41bce.fc20.noarch
systemd-208-16.fc20.x86_64

Comment 1 Göran Uddeborg 2014-06-19 20:11:42 UTC
The boot from the 10 of June lasted until just before midnight between 16 and 17 of June.  Then the machine crashed completely, and the screen went black.  Not even "magic" SysRq attempts gave any reaction.

After rebooting I so far haven't got any new hardware error messages.  Just as with the 3.13.10 kernel, it seems to happen on some, but not all boots with the 3.14.5 kernel.  Maybe some kind of timing issue during the startup?

(Now I hope I don't have to take the machine down, since this seems to be a "good boot". :-)

Comment 2 Justin M. Forbes 2014-11-13 15:57:50 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.17.2-200.fc20.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 21, and are still experiencing this issue, please change the version to Fedora 21.

If you experience different issues, please open a new bug report for those.

Comment 3 Justin M. Forbes 2014-12-10 14:58:23 UTC
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in over 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 4 Göran Uddeborg 2014-12-11 20:10:46 UTC
It hasn't happened in a while now.  Let's hope it is indeed gone.


Note You need to log in before you can comment on or make changes to this bug.