Bug 1770021
Summary: | TPM interrupt storm makes T490s unusable on Fedora 31 | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mikel Olasagasti <molasaga> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 31 | CC: | airlied, bskeggs, hdegoede, ichavero, itamar, james, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, jpazdziora, jsnitsel, kernel-maint, linville, masami256, mchehab, mjg59, mszpak, obudai, rsandu, steved, tadas, thomas, tpopela, zarock | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-11-24 17:00:20 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1816645 | ||||||
Attachments: |
|
Description
Mikel Olasagasti
2019-11-07 22:33:44 UTC
Created attachment 1633809 [details]
dmesg from installer
Updating title - this isn't Secure Boot related. Fedora 30 installer, kernel 5.0.9-301.fc30.x86_64, with TPM enabled doesn't show error messages present in F31 but kernel shows the following message: [ 6.194757] tpm_tis STM7308:00: 2.0 TPM (device-id 0x0, rev-id 78) [ 6.196482] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead My initial guess is the problem is caused by this: commit 1ea32c83c699df32689d329b2415796b7bfc2f6e Author: Stefan Berger <stefanb.com> Date: Thu Aug 29 20:09:06 2019 -0400 tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts The tpm_tis_core has to set the TPM_CHIP_FLAG_IRQ before probing for interrupts since there is no other place in the code that would set it. Cc: linux-stable.org Fixes: 570a36097f30 ("tpm: drop 'irq' from struct tpm_vendor_specific") Signed-off-by: Stefan Berger <stefanb.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen.com> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c index ffa9048d8f6c..270f43acbb77 100644 --- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -981,6 +981,7 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq, } tpm_chip_start(chip); + chip->flags |= TPM_CHIP_FLAG_IRQ; if (irq) { tpm_tis_probe_irq_single(chip, intmask, IRQF_SHARED, irq); The code isn't the most obvious, but tpm_tis_send was already setting that flag. It also checks that interrupts are working and disables them if they aren't. I believe setting the flag here in tpm_tis_core_init short circuits all of that. Disregard that, I was reading the ! in the condition check as wrapping both the flag and priv->irq_tested. Back to looking at code some more. I guess it is still a potential possibility for why it is being seen in 5.3.7 since it does change the behavior of the code. Requested a t490s loaner from logistics. Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. This bug was accidentally closed due to a query error. Reopening. Seeing this on a Clevo N151CU-derived notebook. tpm0 generates around 5000 wakeups/s on interrupt 31. The system is usable, but the CPU gets pegged in a high-power state. Worked around temporarily by disabling TPM in UEFI setup. For reference, my machine has tpm_tis IFX0785:00: 2.0 TPM (device-id 0x1B, rev-id 22) This is resolved for the time being upstream by the following reverts: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa4a63dd981682b1742baa01237036e48bc11923 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dda8b2af395b2ed508e2ef314ae32e122841b447 I believe this once is only needed for 5.5-rc#: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9550f210492c6f88415709002f42a9d15c0e6231 I also have encountered that problem with Hyperbook NH5/Clevo NH55RCQ with 5.5-rc5 (kernel-core-5.5.0-0.rc5.git0.1.fc32.x86_64 - installed on Fedora 31 due to some nouveau regression in 5.3/5.4 on my machine). A lot of tpm0-related IRQ10 (~65% of one core usage).
With kernel-core-5.5.0-0.rc6.git0.1.fc32.x86_64 I do not observe it any longer.
As a side effect an error about "Firmware bug" started to occur:
> kernel: tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id 22)
> kernel: tpm tpm0: tpm_try_transmit: send(): error -5
> kernel: tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
(kernel-core-5.5.0-0.rc6.git0.1.fc32.x86_64 container the aforementioned commit/revert) If you were experiencing the interrupt storm before, that polling message is to be expected with the reverts that went into 5.5. Under 5.5.7-200.fc31.x86_64 (possibly earlier) I no longer get the storm on mine and rngd is well-behaved. Unfortunately there's now the spurious interrupt that results in a regular ABRT notification. [ 1.870470] tpm_tis IFX0785:00: 2.0 TPM (device-id 0x1B, rev-id 22) [ 1.870613] tpm tpm0: tpm_try_transmit: send(): error -5 [ 1.870615] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead ... [ 5.300584] irq 31: nobody cared (try booting with the "irqpoll" option) [ 5.300586] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.7-200.fc31.x86_64 #1 [ 5.300586] Hardware name: Entroware Proteus/Proteus, BIOS 1.07.07TE0 11/15/2019 [ 5.300587] Call Trace: [ 5.300588] <IRQ> [ 5.300592] dump_stack+0x66/0x90 [ 5.300595] __report_bad_irq+0x35/0xa7 [ 5.300596] note_interrupt.cold+0xb/0x63 [ 5.300597] handle_irq_event_percpu+0x6f/0x80 [ 5.300598] handle_irq_event+0x36/0x53 [ 5.300599] handle_fasteoi_irq+0x8b/0x130 [ 5.300601] do_IRQ+0x50/0xe0 [ 5.300603] common_interrupt+0xf/0xf [ 5.300604] </IRQ> [ 5.300606] RIP: 0010:cpuidle_enter_state+0xc9/0x3e0 [ 5.300607] Code: e8 5c e6 8e ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 ea 02 00 00 31 ff e8 4e 3a 95 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 40 02 00 00 49 63 d5 4c 2b 64 24 10 48 8d 04 52 48 [ 5.300607] RSP: 0018:ffffaa32000efe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd [ 5.300608] RAX: ffff9a94706aae00 RBX: ffff9a94706b6100 RCX: 000000000000001f [ 5.300609] RDX: 0000000000000000 RSI: 000000003c9b2e4f RDI: 0000000000000000 [ 5.300609] RBP: ffffffff9774eec0 R08: 000000013bf06824 R09: 000000007fffffff [ 5.300609] R10: 0000000000000005 R11: ffff9a94706a9be4 R12: 000000013bf06824 [ 5.300610] R13: 0000000000000001 R14: 0000000000000001 R15: ffff9a946e13a700 [ 5.300612] ? cpuidle_enter_state+0xa4/0x3e0 [ 5.300613] cpuidle_enter+0x29/0x40 [ 5.300615] do_idle+0x1e4/0x280 [ 5.300616] cpu_startup_entry+0x19/0x20 [ 5.300617] start_secondary+0x162/0x1b0 [ 5.300619] secondary_startup_64+0xb6/0xc0 [ 5.300620] handlers: [ 5.300622] [<0000000012738fae>] tis_int_handler [ 5.300623] Disabling IRQ #31 Getting what looks to be the same issue as James on a new Thinkpad P53, running latest F31: Mar 10 08:18:04 localhost.localdomain kernel: tpm_tis STM7308:00: 2.0 TPM (device-id 0x0, rev-id 78) Mar 10 08:18:04 localhost.localdomain kernel: tpm tpm0: tpm_try_transmit: send(): error -5 Mar 10 08:18:04 localhost.localdomain kernel: tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead ... Mar 10 08:18:04 localhost.localdomain kernel: irq 48: nobody cared (try booting with the "irqpoll" option) Mar 10 08:18:04 localhost.localdomain kernel: CPU: 11 PID: 0 Comm: swapper/11 Not tainted 5.5.7-200.fc31.x86_64 #1 Mar 10 08:18:04 localhost.localdomain kernel: Hardware name: LENOVO 20QNCTO1WW/20QNCTO1WW, BIOS N2NET34W (1.19 ) 11/28/2019 Mar 10 08:18:04 localhost.localdomain kernel: Call Trace: Mar 10 08:18:04 localhost.localdomain kernel: <IRQ> Mar 10 08:18:04 localhost.localdomain kernel: dump_stack+0x66/0x90 Mar 10 08:18:04 localhost.localdomain kernel: __report_bad_irq+0x35/0xa7 Mar 10 08:18:04 localhost.localdomain kernel: note_interrupt.cold+0xb/0x63 Mar 10 08:18:04 localhost.localdomain kernel: handle_irq_event_percpu+0x6f/0x80 Mar 10 08:18:04 localhost.localdomain kernel: handle_irq_event+0x36/0x53 Mar 10 08:18:04 localhost.localdomain kernel: handle_fasteoi_irq+0x8b/0x130 Mar 10 08:18:04 localhost.localdomain kernel: do_IRQ+0x50/0xe0 Mar 10 08:18:04 localhost.localdomain kernel: common_interrupt+0xf/0xf Mar 10 08:18:04 localhost.localdomain kernel: </IRQ> Mar 10 08:18:04 localhost.localdomain kernel: RIP: 0010:cpuidle_enter_state+0xc9/0x3e0 Mar 10 08:18:04 localhost.localdomain kernel: Code: e8 5c e6 8e ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 ea 02 00 00 31 ff e8 4e 3a 95 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 40 02 00 00 49 63 d5 4c 2b 64 24 10 48 8d 04 52 48 Mar 10 08:18:04 localhost.localdomain kernel: RSP: 0018:ffffa9cf0013fe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde Mar 10 08:18:04 localhost.localdomain kernel: RAX: ffff901f8e6eae00 RBX: ffff901f8e6f6200 RCX: 000000000000001f Mar 10 08:18:04 localhost.localdomain kernel: RDX: 0000000000000000 RSI: 000000003161fc2d RDI: 0000000000000000 Mar 10 08:18:04 localhost.localdomain kernel: RBP: ffffffffba74eec0 R08: 00000000930fe0de R09: 000000007fffffff Mar 10 08:18:04 localhost.localdomain kernel: R10: 0000000000000005 R11: ffff901f8e6e9be4 R12: 00000000930fe0de Mar 10 08:18:04 localhost.localdomain kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffff901f8c390000 Mar 10 08:18:04 localhost.localdomain kernel: ? cpuidle_enter_state+0xa4/0x3e0 Mar 10 08:18:04 localhost.localdomain kernel: cpuidle_enter+0x29/0x40 Mar 10 08:18:04 localhost.localdomain kernel: do_idle+0x1e4/0x280 Mar 10 08:18:04 localhost.localdomain kernel: cpu_startup_entry+0x19/0x20 Mar 10 08:18:04 localhost.localdomain kernel: start_secondary+0x162/0x1b0 Mar 10 08:18:04 localhost.localdomain kernel: secondary_startup_64+0xb6/0xc0 Mar 10 08:18:04 localhost.localdomain kernel: handlers: Mar 10 08:18:04 localhost.localdomain kernel: [<0000000078f5af69>] tis_int_handler Mar 10 08:18:04 localhost.localdomain kernel: Disabling IRQ #48 This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |