Bug 1770021 - TPM interrupt storm makes T490s unusable on Fedora 31
Summary: TPM interrupt storm makes T490s unusable on Fedora 31
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1816645
TreeView+ depends on / blocked
 
Reported: 2019-11-07 22:33 UTC by Mikel Olasagasti
Modified: 2020-11-24 17:00 UTC (History)
27 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-11-24 17:00:20 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg from installer (168.26 KB, text/plain)
2019-11-07 22:35 UTC, Mikel Olasagasti
no flags Details

Description Mikel Olasagasti 2019-11-07 22:33:44 UTC
1. Please describe the problem:

I got a new Lenovo T490s from Red Hat and I proceed to install Fedora 31. Before installing I enable TPM and Secure Boot.

During the installation and after I face:

- Mouse freeze after several seconds/minutes
- Trackpoint is ultra-sensible
- rngd consumes 100% of a core, causes fan to go full-speed non-stop
- System feels very slow

Updated to latest F31 packages and latest Lenovo firmware and same issues.

System shows problems with IRQ31, hpet and hundreds of "drm:intel_pipe_update_end" messages.

[    8.626719] Hardware name: LENOVO 20NYS41L02/20NYS41L02, BIOS N2JET77W (1.55 ) 08/13/2019
[    8.626719] Call Trace:
[    8.626721]  <IRQ>
[    8.626725]  dump_stack+0x5c/0x80
[    8.626727]  __report_bad_irq+0x35/0xa7
[    8.626728]  note_interrupt.cold+0xb/0x63
[    8.626729]  handle_irq_event_percpu+0x6f/0x80
[    8.626730]  handle_irq_event+0x28/0x48
[    8.626731]  handle_fasteoi_irq+0x86/0x130
[    8.626733]  handle_irq+0x1c/0x30
[    8.626735]  do_IRQ+0x4b/0xd0
[    8.626736]  common_interrupt+0xf/0xf
[    8.626737]  </IRQ>
[    8.626739] RIP: 0010:cpuidle_enter_state+0xc4/0x420
[    8.626740] Code: e8 a1 ce 92 ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 39 03 00 00 31 ff e8 03 bd 98 ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 89 cf 01 00 00 c7 45 10 00 00 00 00 48 83 c4 18 44 89
[    8.626740] RSP: 0018:ffffa44d400dfe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[    8.626741] RAX: ffff8c3232268880 RBX: ffffffff904d6520 RCX: 000000000000001f
[    8.626742] RDX: 0000000000000000 RSI: 000000003c9b06d1 RDI: 0000000000000000
[    8.626742] RBP: ffff8c3232272e00 R08: 0000000202312781 R09: 000000000000afc7
[    8.626742] R10: ffff8c32322676c4 R11: ffff8c32322676a4 R12: 0000000000000001
[    8.626743] R13: 0000000202312781 R14: 0000000000000001 R15: ffff8c32307a1fc0
[    8.626744]  ? cpuidle_enter_state+0x9f/0x420
[    8.626745]  cpuidle_enter+0x29/0x40
[    8.626747]  do_idle+0x1cf/0x250
[    8.626749]  cpu_startup_entry+0x19/0x20
[    8.626750]  start_secondary+0x163/0x1b0
[    8.626751]  secondary_startup_64+0xa4/0xb0
[    8.626752] handlers:
[    8.626754] [<00000000a80d92dc>] tis_int_handler
[    8.626755] Disabling IRQ #31
(...)
[  168.909317] hpet_rtc_timer_reinit: 57 callbacks suppressed
[  168.909317] hpet: Lost 1 RTC interrupts
[  168.941233] hpet: Lost 1 RTC interrupts
[  168.973146] hpet: Lost 1 RTC interrupts
[  169.005085] hpet: Lost 1 RTC interrupts
[  169.037052] hpet: Lost 1 RTC interrupts
[  169.061958] hpet: Lost 1 RTC interrupts
[  169.093879] hpet: Lost 1 RTC interrupts
[  169.129326] hpet: Lost 1 RTC interrupts
[  169.161252] hpet: Lost 1 RTC interrupts
[  169.193178] hpet: Lost 1 RTC interrupts
[  255.883829] hpet_rtc_timer_reinit: 55 callbacks suppressed
[  255.883829] hpet: Lost 1 RTC interrupts
[  255.919266] hpet: Lost 1 RTC interrupts
[  255.947654] hpet: Lost 1 RTC interrupts
[  255.979557] hpet: Lost 1 RTC interrupts
[  256.011456] hpet: Lost 1 RTC interrupts
[  256.043378] hpet: Lost 1 RTC interrupts
[  256.068295] hpet: Lost 1 RTC interrupts
[  500.171365] hpet: Lost 1 RTC interrupts
[  500.206813] hpet: Lost 1 RTC interrupts
[  500.231776] hpet: Lost 1 RTC interrupts
[  500.281695] hpet: Lost 1 RTC interrupts
[  500.335079] hpet: Lost 1 RTC interrupts
[  500.366989] hpet: Lost 1 RTC interrupts
[  500.398950] hpet: Lost 1 RTC interrupts
[  500.441587] hpet: Lost 2 RTC interrupts
[  500.657961] hpet: Lost 1 RTC interrupts
[  500.686359] hpet: Lost 1 RTC interrupts
(...)
[  127.161659] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=9417 end=9418) time 3551 us, min 1073, max 1079, scanline start 1023, end 145
[  133.554559] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=9801 end=9802) time 7084 us, min 1073, max 1079, scanline start 667, end 26
[  137.586453] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=10043 end=10044) time 7081 us, min 1073, max 1079, scanline start 794, end 153
[  137.664513] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=10048 end=10049) time 7116 us, min 1073, max 1079, scanline start 681, end 40
[  137.721299] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=10051 end=10052) time 7083 us, min 1073, max 1079, scanline start 903, end 262
[  137.770917] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=10054 end=10055) time 7083 us, min 1073, max 1079, scanline start 880, end 239
[  138.562773] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=10102 end=10103) time 7112 us, min 1073, max 1079, scanline start 614, end 1089
[  139.034612] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=10130 end=10131) time 3553 us, min 1073, max 1079, scanline start 988, end 110
[  139.087752] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=10133 end=10134) time 7084 us, min 1073, max 1079, scanline start 964, end 323
[  139.140901] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=10136 end=10137) time 10639 us, min 1073, max 1079, scanline start 702, end 299
(...)
[  255.883829] hpet_rtc_timer_reinit: 55 callbacks suppressed
[  255.883829] hpet: Lost 1 RTC interrupts
[  255.919266] hpet: Lost 1 RTC interrupts
[  255.947654] hpet: Lost 1 RTC interrupts
[  255.979557] hpet: Lost 1 RTC interrupts
[  256.011456] hpet: Lost 1 RTC interrupts
[  256.043378] hpet: Lost 1 RTC interrupts
[  256.068295] hpet: Lost 1 RTC interrupts

# grep 31 proc/interrupts
  31:          0     100000          0          0          0          0          0          0   IO-APIC   31-fasteoi   tpm0

Disabled TPM from UEFI, both installer and systems are performing correctly, no error messages on dmesg and rngd doesn't consume a core.

2. What is the Version-Release number of the kernel:

5.3.7-301.fc31.x86_64 installer
5.3.8-300.fc31.x86_64 after updating

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

There are other similar reports for rngd issue in F29 and F30, but can't confirm if related to TPM.

https://bugzilla.redhat.com/show_bug.cgi?id=1739730

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Problem scenario:
1) Enter UEFI config by pressing F1 during boot
2) Enable TPM: Security -> Security Chip -> Security Chip ON
3) Enable Secure Boot: Security -> Secure Boot -> Secure Boot ON

No issues scenario:
1) Enter UEFI config by pressing F1 during boot
2) Enable TPM: Security -> Security Chip -> Security Chip OFF
3) Enable Secure Boot: Security -> Secure Boot -> Secure Boot ON

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

No changes.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Mikel Olasagasti 2019-11-07 22:35:21 UTC
Created attachment 1633809 [details]
dmesg from installer

Comment 2 Matthew Garrett 2019-11-07 22:50:26 UTC
Updating title - this isn't Secure Boot related.

Comment 3 Mikel Olasagasti 2019-11-08 14:50:59 UTC
Fedora 30 installer, kernel 5.0.9-301.fc30.x86_64, with TPM enabled doesn't show error messages present in F31 but kernel shows the following message:

[    6.194757] tpm_tis STM7308:00: 2.0 TPM (device-id 0x0, rev-id 78)
[    6.196482] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead

Comment 4 Jerry Snitselaar 2019-11-12 03:19:09 UTC
My initial guess is the problem is caused by this:

commit 1ea32c83c699df32689d329b2415796b7bfc2f6e
Author: Stefan Berger <stefanb.com>
Date:   Thu Aug 29 20:09:06 2019 -0400

    tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts
    
    The tpm_tis_core has to set the TPM_CHIP_FLAG_IRQ before probing for
    interrupts since there is no other place in the code that would set
    it.
    
    Cc: linux-stable.org
    Fixes: 570a36097f30 ("tpm: drop 'irq' from struct tpm_vendor_specific")
    Signed-off-by: Stefan Berger <stefanb.com>
    Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen.com>

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index ffa9048d8f6c..270f43acbb77 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -981,6 +981,7 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
                }
 
                tpm_chip_start(chip);
+               chip->flags |= TPM_CHIP_FLAG_IRQ;
                if (irq) {
                        tpm_tis_probe_irq_single(chip, intmask, IRQF_SHARED,
                                                 irq);

Comment 5 Jerry Snitselaar 2019-11-12 03:49:29 UTC
The code isn't the most obvious, but tpm_tis_send was already setting that flag. It also checks that interrupts are working and disables them if they aren't. I believe setting the flag here in tpm_tis_core_init short circuits all of that.

Comment 6 Jerry Snitselaar 2019-11-12 14:30:27 UTC
Disregard that, I was reading the ! in the condition check as wrapping both the flag and priv->irq_tested. Back to looking at code some more.

Comment 7 Jerry Snitselaar 2019-11-12 15:15:29 UTC
I guess it is still a potential possibility for why it is being seen in 5.3.7 since it does change the behavior of the code. Requested a t490s loaner from logistics.

Comment 8 Ben Cotton 2019-11-27 14:20:06 UTC
Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 9 Ben Cotton 2019-11-27 15:38:55 UTC
This bug was accidentally closed due to a query error. Reopening.

Comment 10 James 2020-01-11 12:07:59 UTC
Seeing this on a Clevo N151CU-derived notebook. tpm0 generates around 5000 wakeups/s on interrupt 31. The system is usable, but the CPU gets pegged in a high-power state. Worked around temporarily by disabling TPM in UEFI setup.

Comment 11 James 2020-01-11 16:17:32 UTC
For reference, my machine has tpm_tis IFX0785:00: 2.0 TPM (device-id 0x1B, rev-id 22)

Comment 13 Marcin Zajaczkowski 2020-01-27 22:37:05 UTC
I also have encountered that problem with Hyperbook NH5/Clevo NH55RCQ with 5.5-rc5 (kernel-core-5.5.0-0.rc5.git0.1.fc32.x86_64 - installed on Fedora 31 due to some nouveau regression in 5.3/5.4 on my machine). A lot of tpm0-related IRQ10 (~65% of one core usage).

With kernel-core-5.5.0-0.rc6.git0.1.fc32.x86_64 I do not observe it any longer.

As a side effect an error about "Firmware bug" started to occur:
> kernel: tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id 22)
> kernel: tpm tpm0: tpm_try_transmit: send(): error -5
> kernel: tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead

Comment 14 Marcin Zajaczkowski 2020-01-27 22:37:50 UTC
(kernel-core-5.5.0-0.rc6.git0.1.fc32.x86_64 container the aforementioned commit/revert)

Comment 15 Jerry Snitselaar 2020-01-27 22:57:13 UTC
If you were experiencing the interrupt storm before, that polling message is to be expected with the reverts that went into 5.5.

Comment 16 James 2020-03-03 18:57:20 UTC
Under 5.5.7-200.fc31.x86_64 (possibly earlier) I no longer get the storm on mine and rngd is well-behaved. Unfortunately there's now the spurious interrupt that results in a regular ABRT notification.

[    1.870470] tpm_tis IFX0785:00: 2.0 TPM (device-id 0x1B, rev-id 22)
[    1.870613] tpm tpm0: tpm_try_transmit: send(): error -5
[    1.870615] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead

...

[    5.300584] irq 31: nobody cared (try booting with the "irqpoll" option)
[    5.300586] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.7-200.fc31.x86_64 #1
[    5.300586] Hardware name: Entroware Proteus/Proteus, BIOS 1.07.07TE0 11/15/2019
[    5.300587] Call Trace:
[    5.300588]  <IRQ>
[    5.300592]  dump_stack+0x66/0x90
[    5.300595]  __report_bad_irq+0x35/0xa7
[    5.300596]  note_interrupt.cold+0xb/0x63
[    5.300597]  handle_irq_event_percpu+0x6f/0x80
[    5.300598]  handle_irq_event+0x36/0x53
[    5.300599]  handle_fasteoi_irq+0x8b/0x130
[    5.300601]  do_IRQ+0x50/0xe0
[    5.300603]  common_interrupt+0xf/0xf
[    5.300604]  </IRQ>
[    5.300606] RIP: 0010:cpuidle_enter_state+0xc9/0x3e0
[    5.300607] Code: e8 5c e6 8e ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 ea 02 00 00 31 ff e8 4e 3a 95 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 40 02 00 00 49 63 d5 4c 2b 64 24 10 48 8d 04 52 48
[    5.300607] RSP: 0018:ffffaa32000efe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[    5.300608] RAX: ffff9a94706aae00 RBX: ffff9a94706b6100 RCX: 000000000000001f
[    5.300609] RDX: 0000000000000000 RSI: 000000003c9b2e4f RDI: 0000000000000000
[    5.300609] RBP: ffffffff9774eec0 R08: 000000013bf06824 R09: 000000007fffffff
[    5.300609] R10: 0000000000000005 R11: ffff9a94706a9be4 R12: 000000013bf06824
[    5.300610] R13: 0000000000000001 R14: 0000000000000001 R15: ffff9a946e13a700
[    5.300612]  ? cpuidle_enter_state+0xa4/0x3e0
[    5.300613]  cpuidle_enter+0x29/0x40
[    5.300615]  do_idle+0x1e4/0x280
[    5.300616]  cpu_startup_entry+0x19/0x20
[    5.300617]  start_secondary+0x162/0x1b0
[    5.300619]  secondary_startup_64+0xb6/0xc0
[    5.300620] handlers:
[    5.300622] [<0000000012738fae>] tis_int_handler
[    5.300623] Disabling IRQ #31

Comment 17 Thomas Drake-Brockman 2020-03-09 17:00:25 UTC
Getting what looks to be the same issue as James on a new Thinkpad P53, running latest F31:

Mar 10 08:18:04 localhost.localdomain kernel: tpm_tis STM7308:00: 2.0 TPM (device-id 0x0, rev-id 78)
Mar 10 08:18:04 localhost.localdomain kernel: tpm tpm0: tpm_try_transmit: send(): error -5
Mar 10 08:18:04 localhost.localdomain kernel: tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead

...

Mar 10 08:18:04 localhost.localdomain kernel: irq 48: nobody cared (try booting with the "irqpoll" option)
Mar 10 08:18:04 localhost.localdomain kernel: CPU: 11 PID: 0 Comm: swapper/11 Not tainted 5.5.7-200.fc31.x86_64 #1
Mar 10 08:18:04 localhost.localdomain kernel: Hardware name: LENOVO 20QNCTO1WW/20QNCTO1WW, BIOS N2NET34W (1.19 ) 11/28/2019
Mar 10 08:18:04 localhost.localdomain kernel: Call Trace:
Mar 10 08:18:04 localhost.localdomain kernel:  <IRQ>
Mar 10 08:18:04 localhost.localdomain kernel:  dump_stack+0x66/0x90
Mar 10 08:18:04 localhost.localdomain kernel:  __report_bad_irq+0x35/0xa7
Mar 10 08:18:04 localhost.localdomain kernel:  note_interrupt.cold+0xb/0x63
Mar 10 08:18:04 localhost.localdomain kernel:  handle_irq_event_percpu+0x6f/0x80
Mar 10 08:18:04 localhost.localdomain kernel:  handle_irq_event+0x36/0x53
Mar 10 08:18:04 localhost.localdomain kernel:  handle_fasteoi_irq+0x8b/0x130
Mar 10 08:18:04 localhost.localdomain kernel:  do_IRQ+0x50/0xe0
Mar 10 08:18:04 localhost.localdomain kernel:  common_interrupt+0xf/0xf
Mar 10 08:18:04 localhost.localdomain kernel:  </IRQ>
Mar 10 08:18:04 localhost.localdomain kernel: RIP: 0010:cpuidle_enter_state+0xc9/0x3e0
Mar 10 08:18:04 localhost.localdomain kernel: Code: e8 5c e6 8e ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 ea 02 00 00 31 ff e8 4e 3a 95 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 40 02 00 00 49 63 d5 4c 2b 64 24 10 48 8d 04 52 48
Mar 10 08:18:04 localhost.localdomain kernel: RSP: 0018:ffffa9cf0013fe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
Mar 10 08:18:04 localhost.localdomain kernel: RAX: ffff901f8e6eae00 RBX: ffff901f8e6f6200 RCX: 000000000000001f
Mar 10 08:18:04 localhost.localdomain kernel: RDX: 0000000000000000 RSI: 000000003161fc2d RDI: 0000000000000000
Mar 10 08:18:04 localhost.localdomain kernel: RBP: ffffffffba74eec0 R08: 00000000930fe0de R09: 000000007fffffff
Mar 10 08:18:04 localhost.localdomain kernel: R10: 0000000000000005 R11: ffff901f8e6e9be4 R12: 00000000930fe0de
Mar 10 08:18:04 localhost.localdomain kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffff901f8c390000
Mar 10 08:18:04 localhost.localdomain kernel:  ? cpuidle_enter_state+0xa4/0x3e0
Mar 10 08:18:04 localhost.localdomain kernel:  cpuidle_enter+0x29/0x40
Mar 10 08:18:04 localhost.localdomain kernel:  do_idle+0x1e4/0x280
Mar 10 08:18:04 localhost.localdomain kernel:  cpu_startup_entry+0x19/0x20
Mar 10 08:18:04 localhost.localdomain kernel:  start_secondary+0x162/0x1b0
Mar 10 08:18:04 localhost.localdomain kernel:  secondary_startup_64+0xb6/0xc0
Mar 10 08:18:04 localhost.localdomain kernel: handlers:
Mar 10 08:18:04 localhost.localdomain kernel: [<0000000078f5af69>] tis_int_handler
Mar 10 08:18:04 localhost.localdomain kernel: Disabling IRQ #48

Comment 18 Ben Cotton 2020-11-03 16:54:31 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 19 Ben Cotton 2020-11-24 17:00:20 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.