Bug 1301739 - Machine Check exceptions related to transient temperature spikes get reported to abrt.
Machine Check exceptions related to transient temperature spikes get reported...
Status: NEW
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
25
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-25 16:21 EST by Tom Prince
Modified: 2017-04-11 20:34 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg log from a Lenovo Thinkpas T450s (871 bytes, text/plain)
2016-01-25 16:21 EST, Tom Prince
no flags Details

  None (edit)
Description Tom Prince 2016-01-25 16:21:11 EST
Created attachment 1118230 [details]
dmesg log from a Lenovo Thinkpas T450s

Description of problem:

I've got abrt regularlly reporting machine-check errors about temperature over-threshold that lasts for ~0.0011 seconds. This seems like it should be something reported via another method. It has gotten so that I just routinely ignore the abrt messages because they always seem to be this.

How reproducible:
Several times per day

Steps to Reproduce:
1. Cause the machine to heat up (this often happens running graphically intensive games).
2. Observe the MCE reported to abrt.

Actual results:
MCE reported to abrt


Expected results:
Ideally, it would be nice if the error was only reported to abrt if it wasn't transient (or perhaps for the threshold for throttling being lower than the threshold for reporting). It would also be nice if incidents were record to be able to see if the frequency of occurrence is significant.

But, I'd be happy if abrt simply ignored these errors, as I have abrt-fatigue from them.
Comment 1 Tomasz Torcz 2016-01-28 05:26:39 EST
The messages are:
[4942478.364568] CPU3: Package temperature above threshold, cpu clock throttled (total events = 289096)
[4942478.364579] CPU0: Package temperature above threshold, cpu clock throttled (total events = 289098)
[4942478.364581] CPU1: Package temperature above threshold, cpu clock throttled (total events = 289098)
[4942478.364584] CPU2: Package temperature above threshold, cpu clock throttled (total events = 289098)
[4942478.365577] CPU3: Package temperature/speed normal
[4942478.365578] CPU2: Package temperature/speed normal
[4942478.365580] CPU0: Package temperature/speed normal
[4942478.365590] CPU1: Package temperature/speed normal

I do experience this issue on well-cooled, mostly idle desktop form-factor with Intel(R) Core(TM) i5-2400S CPU @ 2.50GHz, 4.2.6-301.fc23.x86_64
Comment 2 Josh Boyer 2016-01-28 08:12:39 EST
So there are two issues here.  The first is that the kernel is simply doing its job and is reporting the events.  That they are of an extremely short duration and kind of spammy is a downside, but it isn't incorrect.  The second issue is that abrt is triggering on them, but likely because of the mce being logged, not the temp messages themselves.

There most suitable workaround here is for abrt to not trigger on thermal events of such a short duration.  However, I doubt it is even looking at what caused the mce and it might not be easy for abrt to do that.  Will need to think so more.
Comment 3 David Gibson 2016-09-22 01:35:58 EDT
Just updating to note this is still present in Fedora 24, at least on my T460s.

(Also, hi Josh, long time no talk).
Comment 4 Justin M. Forbes 2017-04-11 10:43:29 EDT
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs.

Fedora 25 has now been rebased to 4.10.9-100.fc24.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26.

If you experience different issues, please open a new bug report for those.
Comment 5 David Gibson 2017-04-11 20:34:56 EDT
I still see these frequently with Fedora 25 and kernel-4.10.8-200.fc25.x86_64.  I'll try 4.10.9 when it arrives.

Note You need to log in before you can comment on or make changes to this bug.