Bug 1192856

Summary: ACPI Interrupt storm causes high kworker CPU usage
Product: [Fedora] Fedora Reporter: Giovanni Campagna <scampa.giovanni>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 26CC: ben.r.xiao, cpanceac, damiannohales, gansalmon, gokcen.eraslan, itamar, jonathan, juha.heljoranta, kernel-maint, madhu.chinakonda, mchehab, scampa.giovanni
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-23 18:05:34 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Giovanni Campagna 2015-02-15 18:42:42 EST
Description of problem:
Since I last upgraded to the last release of OS X (which probably upgraded the firmware), I have a high CPU usage (between 70 and 90%) on kworker.

I tracked it down to an interrupt storm on ACPI interrupt GPE06. I had to disable the interrupt because after a reboot and before starting any app I had already 4347271 calls.

Version-Release number of selected component (if applicable):
kernel-3.18.5-201.fc21.x86_64
(but it happened in kernel-3.18.3-201.fc21.x86_64 as well)

/sys/class/dmi/id/bios_date: 01/07/2015
/sys/class/dmi/id/bios_vendor: Apple Inc.
/sys/class/dmi/id/bios_version: MBP112.88Z.0138.B14.1501071031
/sys/class/dmi/id/product_name: MacBookPro11,2

(This is a MacBook Pro Retina 15" from late 2014, but bought it a month ago. The bios_date might be inaccurate, because the problem appeared only since a few days ago.)

How reproducible:
On every boot, unless the interrupt is forced disabled.

Steps to Reproduce:
1. Reboot
2. Watch high kworker CPU usage and /sys/firmware/acpi/interrupts/gpe06 value
Comment 1 Giovanni Campagna 2015-03-07 18:18:35 EST
For the record, a acpi decompile shows that GPE06 is triggering
\_SB.PCI0.IGPU.GSCI()
the SCI handler for the integrated graphics card.

Further investigation in the kernel source code reveals that SCI is a mechanism through which GPU drivers communicate with the firmware, and it is used/implemented by i915.
The protocol seems to be fairly simple:
- the driver writes parameters and command to PARM and SWSCI_SCIC_INDICATOR (SCIE in ACPI code) in the OpRegion system memory area
- the driver triggers the interrupt by writing 1 to PCI_SWSCI_GSSCIE (GSSE in the ACPI code)
- the interrupt causes a GPE which is handled by the ASL code
- the ASL code completes by setting SCIE to 0
- the drivers sees SCIE going down to 0, and reads the result of the call in GESF

The ASL method always completes setting SCIE to 0. GXFC (mapped to SWSCI_SCIC_EXIT_PARAMETER in the i915 code) is set to the return value of the call, most of the time 1 (SUCCESS).
GESF (SWSCI_SCIC_SUB_FUNCTION) is also usually reset to 0, but that should not affect operation because it is set by the driver prior to raising the interrupt triggering bit.

Now, on the specifics of the calls:
GBDA_SUPPORTED_CALLS returns 0x20000, which i915 shifts left by 1 and uses as a mask for valid calls. This does not appear to be a valid result: it would only allow GBDA function 18, which is not a good function.
Indeed, the GBDA code supports function 0 (supported calls), 1 (requested callbacks), 4 (boot display pref), 5 (panel details), 6 (tv standard), 7 (internal graphics), 10 (spread spectrum) - ie, all calls that i915 will attempt to make. It also supports GESF 11, that returns KSV0 (32 bits) from the GNVS (graphics non volatile storage?) system memory OpRegion.
That would give a mask of 0xcf1, and a supported calls result of 0x678.

After supported_calls, i915 would attempt to query requested_callbacks and SBCB supported_callbacks. But the mask is not valid and the call is rejected with -EINVAL.
If it did query requested_callbacks, it would get 0x200000, which is used as a mask for requested callbacks directly. This appears to be bogus: while callback 21 is a valid call (enable/disable audio), the SCBC code has more than that.
Nevertheless, supported_callbacks on the SBCB main function also returns 0x200000 (or it would do so if i915 called it, which it doesn't because it's not "supported").
If you look at the code though, SBCB supports (as in, has code to handle, although some code seems to be stubbed out): 1 (init completion), 3 (pre hires set mode), 4 (post hires set mode), 5 (display switch), 6 (set tv format), 7 (adapter power state), 8 (display power state), 9 (set boot display), 10 (set panel details), 11 (set internal gfx), 16 (post hires to dos fs), 17 (suspend resume), 18 (set spread spectrum), 19 (post vbe pm), 20 (unknown/not used by i915), 21 (enable/disable audio).

Comparing this to another machine which gives proper results for the supported_calls function, it seems that a shift should be applied to the result 
of requested_callbacks, and it appears that it is common to have code that handles unsupported functions without error (but also without doing anything).

In any case this still does not explain the interrupt storm: even in presence of a broken ACPI table, the i915 driver should bail early because the call is not supported, so it should not assert the interrupt pin and cause the storm.
And even if by chance or bug the driver asked for an unsupported interrupt, the enable pin should be cleared at the end by the ACPI handler.
Comment 2 Damián Nohales 2015-04-09 20:56:16 EDT
People are reporting (here https://bugzilla.kernel.org/show_bug.cgi?id=85881) that upgrading to MacOS X 10.10.2+ is fixing the issue, so maybe a firmware bug.

I also can confirm this in my MB Air mid-2013, after upgrading from 10.10.1 to 10.10.3.
Comment 3 Giovanni Campagna 2015-04-09 21:57:22 EDT
Nope, I was on 10.10.2 and the bug was manifesting itself, I upgraded to 10.10.3 and the problem is still there.

I believe it's a different bug than the one you link - for one, it's a different GPE number.
You might say it's an artifact of the HW, but I think actually GPE06 is meaningful because it appears that HW vendors just take the SCI code for the Intel integrated GPU card as a blob (maybe from Intel directly?) and tweak a few params to make it work with their bios - this is by comparison with a Dell laptop that has the same code, same GPE numbers and same variable names on wildly different HW.

In any case, still reproducible on Fedora 22, kernel 4.0.0-0.rc5.git4.1.fc22.x86_64, MacBookPro 11,2 fully upgraded to 10.10.3 on the Mac side.
Comment 4 Fedora Kernel Team 2015-04-28 14:34:50 EDT
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 21 kernel bugs.

Fedora 21 has now been rebased to 3.19.5-200.fc21.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 22, and are still experiencing this issue, please change the version to Fedora 22.

If you experience different issues, please open a new bug report for those.
Comment 5 Fedora End Of Life 2015-11-04 05:50:48 EST
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 6 Fedora End Of Life 2015-12-02 04:05:43 EST
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.
Comment 7 Giovanni Campagna 2015-12-19 08:58:39 EST
This is still a problem in kernel 4.2.6-301.fc23.x86_64
Comment 8 Benjamin Xiao 2016-02-21 19:10:27 EST
Still getting this on my Macbook Pro with kernel 4.3.5 in Fedora 23. Can confirm that disabling gpe06 works to reduce CPU usage.
Comment 9 Gökçen Eraslan 2016-02-22 01:55:15 EST
I already filed a bug report to kernel, see https://bugzilla.kernel.org/show_bug.cgi?id=105781
Comment 10 Benjamin Xiao 2016-05-16 16:47:14 EDT
Any update on this? Do the 4.4 and 4.5 kernels still exhibit this issue?
Comment 11 Benjamin Xiao 2016-05-16 16:56:20 EDT
Still happening with 4.4. Just confirmed on my Macbook Pro
Comment 12 Laura Abbott 2016-09-23 15:54:00 EDT
*********** MASS BUG UPDATE **************
 
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 23 kernel bugs.
 
Fedora 23 has now been rebased to 4.7.4-100.fc23.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 24 or 25, and are still experiencing this issue, please change the version to Fedora 24 or 25.
 
If you experience different issues, please open a new bug report for those.
Comment 13 Benjamin Xiao 2016-09-27 18:30:31 EDT
I am still experiencing this on Fedora 24, but I can't change the bug status.
Comment 14 Giovanni Campagna 2016-09-28 21:56:00 EDT
I have changed the version to 24, given that I was experiencing this bug on F24, and people apparently still are.

But I moved to F25, and I am unable to reproduce it on
4.8.0-0.rc7.git0.1.fc25.x86_64

After reenabling GPE06, the interrupt count is stable. No other GPEs seem to have high count either.
I have yet to attempt a reboot since reenabling the GPE, but so far the situation is quiet.
Comment 15 Giovanni Campagna 2016-09-29 00:25:28 EDT
Never mind, the problem persists on F25.

I believe the problem might be related to Thunderbolt / miniDP, because the problem is not present when an external display is connected.

As usual, given the prevalence of this bug, and how easy it is to obtain HW where it can be reproduced, it would be nice to fix it eventually.
Comment 16 Laura Abbott 2017-01-16 20:25:00 EST
*********** MASS BUG UPDATE **************
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 25 kernel bugs.
 
Fedora 25 has now been rebased to 4.9.3-200.fc25.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26.
 
If you experience different issues, please open a new bug report for those.
Comment 17 Laura Abbott 2017-02-23 18:05:34 EST
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.
Comment 18 Giovanni Campagna 2017-06-03 21:09:06 EDT
The problem can still be reproduced on kernel 4.11.3-300.fc26.x86_64 (Fedora 26). Reopening.
Comment 19 cornel panceac 2017-08-27 16:26:27 EDT
I see high CPU Load (82%)from kworker on intel core i3.


  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                        
10070 root      20   0       0      0      0 R  81.7  0.0 110:58.11 kworker/0:3      

$ uname -a
Linux localhost.localdomain 4.12.8-300.fc26.x86_64 #1 SMP Thu Aug 17 15:30:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

This may be related to the opening of epiphany or firefox.
Comment 20 cornel panceac 2017-08-27 16:28:58 EDT
Note that this is not always. Trying to figure the reproduction steps.