Bug 1598615 - CPU throttled at 45° C, "critical" set to 100°
Summary: CPU throttled at 45° C, "critical" set to 100°
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-06 01:21 UTC by Andy
Modified: 2021-05-20 07:42 UTC (History)
48 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
Linux Thermal Throttling explained by Lenovo (325.41 KB, application/pdf)
2019-09-25 14:01 UTC, Andy
no flags Details

Description Andy 2018-07-06 01:21:16 UTC
Description of problem:
I've just got my brand new Thinkpad T470s with a i7-7600U.

The kernel is reporting my CPU was to hot and it's being throttled, even though I didn't do anything CPU hungry.

When booting Fedora I'm seeing the following warnings:

> CPU2: Core temperature above threshold, cpu clock throttled (total events = 2080)
> CPU0: Core temperature above threshold, cpu clock throttled (total events = 2080)
> CPU0: Package temperature above threshold, cpu clock throttled (total events = 2206)
> CPU2: Package temperature above threshold, cpu clock throttled (total events = 2206)
> CPU1: Package temperature above threshold, cpu clock throttled (total events = 2206)
> CPU3: Package temperature above threshold, cpu clock throttled (total events = 2206)

`sensors` shows:
> coretemp-isa-0000
> Adapter: ISA adapter
> Package id 0:  +48.0°C  (high = +100.0°C, crit = +100.0°C)
> Core 0:        +46.0°C  (high = +100.0°C, crit = +100.0°C)
> Core 1:        +48.0°C  (high = +100.0°C, crit = +100.0°C)


Version-Release number of selected component (if applicable):

Kernel 4.17.3-200.fc28.x86_64, 


How reproducible:
No idea! For me that's the case after every restart.

Actual results:
CPU throttled.

Expected results:
No throttling under 100°C

Additional info:
I'm noticing a sluggish behaviour of Gnome, resizing windows leaves black boxes on my external screen, even with a Intel HD 620.

Comment 1 Andy 2018-07-06 01:57:35 UTC
Just checked lscpu, and mine is throttled to 900 MHz.

Comment 2 Andy 2018-07-06 15:49:36 UTC
Is it that the CPU temperature can change so quickly that I cannot observe the critical temperature even a minute later?

I guess I have another issue then that that CPU actually constantly get's too hot.

Comment 3 Brian Mendenhall 2018-07-13 15:36:01 UTC
I have been having this problem as well. This is happening constantly even without any processes other then the basic display manager.

At first I noted that my scaling governor was set to powersave which caused the throttling to be severe and was really disruptive to my daily schedule and I have since set the governor to 'performance' which as caused the scaling to be much more manageable (at least I can actually use my laptop now), but hasn't stopped the issue.

Hardware:
  Lenovo ThinkPad T480
  Intel Core i7-8650U CPU @ 1.90GHz
  32GB RAM
  VGA Adaptor: Intel Corporation UHD Graphics 620
  3D controller: NVIDIA Corporation GP108M [GeForce MX150]
 
Kernel: 4.17.3-200.fc28.x86_64
Fedora 28

I am unfortunately running with the nouveau driver which is really troublesome and might be a huge contributor to this problem (and possibly the OP) and has caused quite a few crashes that have either been insignificant (I didn't even notice the driver crashed) to completely locking up my workstation.

This has even just happened while typing this comment:
[15625.782074] thinkpad_acpi: EC reports that Thermal Table has changed
[15627.920642] ------------[ cut here ]------------
[15627.920643] nouveau 0000:01:00.0: timeout
[15627.920697] WARNING: CPU: 3 PID: 10772 at drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c:86 nvkm_pmu_reset+0x14c/0x160 [nouveau]
[15627.920698] Modules linked in: fuse ccm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep vfat fat arc4 intel_rapl iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp iwlmvm coretemp mei_wdt kvm_intel mac80211 snd_hda_codec_hdmi kvm snd_soc_skl snd_soc_skl_ipc snd_hda_ext_core snd_soc_sst_dsp snd_soc_sst_ipc snd_hda_codec_realtek
[15627.920717]  snd_soc_acpi irqbypass snd_soc_core snd_hda_codec_generic intel_cstate iwlwifi snd_compress intel_uncore snd_pcm_dmaengine intel_rapl_perf ac97_bus snd_hda_intel uvcvideo snd_hda_codec btusb cfg80211 btrtl videobuf2_vmalloc btbcm videobuf2_memops btintel videobuf2_v4l2 snd_hda_core thunderbolt videobuf2_common bluetooth snd_hwdep joydev snd_seq wmi_bmof intel_wmi_thunderbolt snd_seq_device videodev snd_pcm idma64 thinkpad_acpi ucsi_acpi snd_timer intel_lpss_pci i2c_i801 media mei_me typec_ucsi mei ecdh_generic intel_lpss intel_pch_thermal snd shpchp typec soundcore int3403_thermal processor_thermal_device int3400_thermal int340x_thermal_zone rfkill acpi_thermal_rel acpi_pad intel_soc_dts_iosf auth_rpcgss sunrpc dm_crypt i915 nouveau mxm_wmi ttm i2c_algo_bit drm_kms_helper crct10dif_pclmul
[15627.920761]  crc32_pclmul crc32c_intel nvme uas e1000e drm usb_storage nvme_core ghash_clmulni_intel serio_raw wmi video
[15627.920782] CPU: 3 PID: 10772 Comm: lspci Tainted: G        W         4.17.3-200.fc28.x86_64 #1
[15627.920783] Hardware name: LENOVO 20L5CTO1WW/20L5CTO1WW, BIOS N24ET37W (1.12 ) 03/14/2018
[15627.920800] RIP: 0010:nvkm_pmu_reset+0x14c/0x160 [nouveau]
[15627.920801] RSP: 0018:ffffa03f4dae7af8 EFLAGS: 00010282
[15627.920802] RAX: 0000000000000000 RBX: ffff9066ece5edc0 RCX: 0000000000000006
[15627.920802] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff9067114d6930
[15627.920803] RBP: ffff9066e6e5b400 R08: 0000000000000030 R09: 0000000000000677
[15627.920803] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9066e7e88c00
[15627.920804] R13: ffff9066e80a1000 R14: 00000e3628c3dbc0 R15: ffff9066eceaa0a0
[15627.920805] FS:  00007f2409e57740(0000) GS:ffff9067114c0000(0000) knlGS:0000000000000000
[15627.920805] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15627.920806] CR2: 000006abbc76a000 CR3: 00000007bb882001 CR4: 00000000003606e0
[15627.920806] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15627.920807] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[15627.920807] Call Trace:
[15627.920825]  nvkm_pmu_init+0x16/0x40 [nouveau]
[15627.920834]  nvkm_subdev_init+0xb2/0x200 [nouveau]
[15627.920851]  nvkm_device_init+0x123/0x280 [nouveau]
[15627.920867]  nvkm_udevice_init+0x41/0x60 [nouveau]
[15627.920877]  nvkm_object_init+0x3e/0x100 [nouveau]
[15627.920886]  nvkm_object_init+0x71/0x100 [nouveau]
[15627.920896]  nvkm_object_init+0x71/0x100 [nouveau]
[15627.920899]  ? pci_restore_standard_config+0x40/0x40
[15627.920914]  nouveau_do_resume+0x28/0x150 [nouveau]
[15627.920930]  nouveau_pmops_runtime_resume+0x88/0x150 [nouveau]
[15627.920932]  pci_pm_runtime_resume+0x78/0xb0
[15627.920934]  __rpm_callback+0xca/0x210
[15627.920935]  ? pci_restore_standard_config+0x40/0x40
[15627.920936]  rpm_callback+0x1f/0x70
[15627.920937]  ? pci_restore_standard_config+0x40/0x40
[15627.920938]  rpm_resume+0x560/0x780
[15627.920939]  pm_runtime_barrier+0x96/0xa0
[15627.920940]  pci_config_pm_runtime_get+0x36/0x50
[15627.920942]  pci_read_config+0x95/0x290
[15627.920944]  ? _cond_resched+0x15/0x30
[15627.920946]  ? __kmalloc+0x19a/0x230
[15627.920948]  kernfs_fop_read+0xac/0x180
[15627.920950]  __vfs_read+0x36/0x170
[15627.920951]  vfs_read+0x8a/0x140
[15627.920952]  ksys_pread64+0x61/0xa0
[15627.920954]  do_syscall_64+0x5b/0x160
[15627.920956]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[15627.920958] RIP: 0033:0x7f2409557577
[15627.920958] RSP: 002b:00007ffe92c3b0d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
[15627.920959] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2409557577
[15627.920959] RDX: 0000000000000040 RSI: 000055dd3bce42c0 RDI: 0000000000000003
[15627.920960] RBP: 0000000000000040 R08: 00007f2409a4d6d2 R09: 00007ffe92c3a590
[15627.920960] R10: 0000000000000000 R11: 0000000000000246 R12: 000055dd3bcebfe0
[15627.920961] R13: 000055dd3bce42c0 R14: 0000000000000000 R15: 0000000000000000
[15627.920961] Code: 41 5c 41 5d 41 5e c3 48 8b 7d 10 48 8b 5f 50 48 85 db 74 1e e8 76 03 17 ee 48 89 da 48 c7 c7 1f f5 58 c0 48 89 c6 e8 8e 3d c5 ed <0f> 0b e9 42 ff ff ff 48 8b 5f 10 eb dc 48 8b 5f 10 eb a5 90 0f 
[15627.920978] ---[ end trace ccc3e7c7618ee3b1 ]---
[15633.393937] thinkpad_acpi: EC reports that Thermal Table has changed
[15780.648342] CPU4: Package temperature above threshold, cpu clock throttled (total events = 2123)
[15780.648342] CPU0: Package temperature above threshold, cpu clock throttled (total events = 2123)
[15780.648381] CPU6: Package temperature above threshold, cpu clock throttled (total events = 2123)
[15780.648382] CPU5: Package temperature above threshold, cpu clock throttled (total events = 2123)
[15780.648383] CPU1: Package temperature above threshold, cpu clock throttled (total events = 2123)
[15780.648383] CPU2: Package temperature above threshold, cpu clock throttled (total events = 2123)
[15780.648384] CPU3: Package temperature above threshold, cpu clock throttled (total events = 2123)
[15780.648385] CPU7: Package temperature above threshold, cpu clock throttled (total events = 2123)
[15780.671327] CPU7: Package temperature/speed normal
[15780.671328] CPU3: Package temperature/speed normal
[15780.671391] CPU0: Package temperature/speed normal
[15780.671392] CPU5: Package temperature/speed normal
[15780.671393] CPU4: Package temperature/speed normal
[15780.671393] CPU1: Package temperature/speed normal
[15780.671394] CPU6: Package temperature/speed normal
[15780.671395] CPU2: Package temperature/speed normal

I unfortunately don't have a lot of time to spend on researching this problem but if I can provide any new information that might help solve this bug, PLEASE let me know.

Another VERY weird symptom is the output of 'sensors' ... it shows the nvidia card at 511 degrees celcius?!!
[root@PIQLT501 bin] # sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +51.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +49.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +50.0°C  (high = +100.0°C, crit = +100.0°C)
Core 2:        +51.0°C  (high = +100.0°C, crit = +100.0°C)
Core 3:        +49.0°C  (high = +100.0°C, crit = +100.0°C)

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +46.5°C  

acpitz-virtual-0
Adapter: Virtual device
temp1:        +50.0°C  (crit = +128.0°C)

iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +32.0°C  

thinkpad-isa-0000
Adapter: ISA adapter
fan1:           0 RPM

nouveau-pci-0100
Adapter: PCI adapter
temp1:       +511.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

...

While typing this I've just noticed that 4.17.4-200.fc28 has been released ... going to install it and report back if anything changes ...

Regards,
Brian Mendenhall

Comment 4 Laura Abbott 2018-10-01 21:21:54 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs.
 
Fedora 28 has now been rebased to 4.18.10-300.fc28.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 29, and are still experiencing this issue, please change the version to Fedora 29.
 
If you experience different issues, please open a new bug report for those.

Comment 5 Andy 2018-10-15 13:25:03 UTC
Currently I'm on kernel 4.18.12-200.fc28, and my journal from 3 days ago contains this:

Okt 12 13:39:16 schoko kernel: CPU1: Package temperature above threshold, cpu clock throttled (total events = 70862)
…

I just restarted and the message showed again on boot.

Comment 6 Vít Ondruch 2019-01-11 10:45:33 UTC
This has been always an issue for 1.5 years I have t470s. The throttling begins immediately during boot, right after the laptop is switched on. It seems there are wrong temperatures specified somewhere in the kernel comparing to Windows, as suggested here [1] and on various other places. There even exists a tool which workarounds this [2], but ...

@Hans, would you be so kind and could you please look into this? Because this completely destroys your flicker-free boot effort.

[1] https://forums.lenovo.com/t5/Linux-Discussion/X1C6-T480s-low-cTDP-and-trip-temperature-in-Linux/td-p/4028489
[2] https://github.com/erpalma/lenovo-throttling-fix

Comment 7 Martijn ten Heuvel 2019-01-22 18:17:14 UTC
My t460s has had this since I have it, basically for 2,5 years now. I don't expect it to get any better anytime soon. 


I just powered the machine up; the disk is encrypted and I wasn't directly typing. I see this on my screen:
[    1.094212] CPU1: Core temperature above threshold, cpu clock throttled (tota
l events = 1)
[    1.094212] CPU3: Core temperature above threshold, cpu clock throttled (tota
l events = 1)
[    1.094214] CPU3: Package temperature above threshold, cpu clock throttled (t
otal events = 1)
[    1.094256] CPU0: Package temperature above threshold, cpu clock throttled (t
otal events = 1)
[    1.094257] CPU2: Package temperature above threshold, cpu clock throttled (t
otal events = 1)
[    1.094316] CPU1: Package temperature above threshold, cpu clock throttled (t
otal events = 1)


ONE second after powering up? 
Current kernel is 4.19.15-300.fc29.x86_64, running fed29 (but had this issue since.. 24/25?).

Comment 8 idhaoui 2019-01-23 07:53:03 UTC
(In reply to Martijn ten Heuvel from comment #7)
> My t460s has had this since I have it, basically for 2,5 years now. I don't
> expect it to get any better anytime soon. 
> 
> 
> I just powered the machine up; the disk is encrypted and I wasn't directly
> typing. I see this on my screen:
> [    1.094212] CPU1: Core temperature above threshold, cpu clock throttled
> (tota
> l events = 1)
> [    1.094212] CPU3: Core temperature above threshold, cpu clock throttled
> (tota
> l events = 1)
> [    1.094214] CPU3: Package temperature above threshold, cpu clock
> throttled (t
> otal events = 1)
> [    1.094256] CPU0: Package temperature above threshold, cpu clock
> throttled (t
> otal events = 1)
> [    1.094257] CPU2: Package temperature above threshold, cpu clock
> throttled (t
> otal events = 1)
> [    1.094316] CPU1: Package temperature above threshold, cpu clock
> throttled (t
> otal events = 1)
> 
> 
> ONE second after powering up? 
> Current kernel is 4.19.15-300.fc29.x86_64, running fed29 (but had this issue
> since.. 24/25?).



I am having similar behaviour with my T460s but immediately I have a message to say the opposite.. check logs below

Jan 23 11:40:23 fireball.redhat.local kernel: CPU0: Core temperature above threshold, cpu clock throttled (total events = 93)
Jan 23 11:40:23 fireball.redhat.local kernel: CPU2: Core temperature above threshold, cpu clock throttled (total events = 93)
Jan 23 11:40:23 fireball.redhat.local kernel: CPU1: Package temperature above threshold, cpu clock throttled (total events = 93)
Jan 23 11:40:23 fireball.redhat.local kernel: CPU3: Package temperature above threshold, cpu clock throttled (total events = 93)
Jan 23 11:40:23 fireball.redhat.local kernel: CPU2: Package temperature above threshold, cpu clock throttled (total events = 93)
Jan 23 11:40:23 fireball.redhat.local kernel: CPU0: Package temperature above threshold, cpu clock throttled (total events = 93)              <==== !!
Jan 23 11:40:23 fireball.redhat.local kernel: CPU0: Core temperature/speed normal                                                             <==== !!
Jan 23 11:40:23 fireball.redhat.local kernel: CPU2: Core temperature/speed normal
Jan 23 11:40:23 fireball.redhat.local kernel: CPU1: Package temperature/speed normal
Jan 23 11:40:23 fireball.redhat.local kernel: CPU3: Package temperature/speed normal
Jan 23 11:40:23 fireball.redhat.local kernel: CPU2: Package temperature/speed normal
Jan 23 11:40:23 fireball.redhat.local kernel: CPU0: Package temperature/speed normal

Kernel is also latest : 4.19.14-300.fc29.x86_64 

Regards,

Comment 9 Andy 2019-01-23 14:20:08 UTC
I'll get back to the question I posed before:

> Is it that the CPU temperature can change so quickly that I cannot observe the critical temperature even a minute later?
> I guess I have another issue then that that CPU actually constantly gets too hot.

How quickly can the temperature rise in a CPU? I have no idea. As Martijn statet
> ONE second after powering up? 

I guess everybody with the issue has an i7?

Before buying the T470s I read in tests that it's design is flawed and not able to cool the i7 efficiently. So apparently even on windows it's running constantly throttled. I tried to get one with i5 instead, but ended up with the i7.

So is it plausible that the CPU *actually* gets too hot?

Comment 10 Martijn ten Heuvel 2019-01-23 18:44:37 UTC
(In reply to Andy from comment #9)
> I'll get back to the question I posed before:
> 
> > Is it that the CPU temperature can change so quickly that I cannot observe the critical temperature even a minute later?
> > I guess I have another issue then that that CPU actually constantly gets too hot.
> 
> How quickly can the temperature rise in a CPU? I have no idea. As Martijn
> statet
> > ONE second after powering up? 
> 
> I guess everybody with the issue has an i7?
> 
> Before buying the T470s I read in tests that it's design is flawed and not
> able to cool the i7 efficiently. So apparently even on windows it's running
> constantly throttled. I tried to get one with i5 instead, but ended up with
> the i7.
> 
> So is it plausible that the CPU *actually* gets too hot?

The machine (t460s) I have is company provided and has an i7-6600U. It ran fine all day, and just did the following:

2019-01-23T15:45:49,105186+01:00 CPU3: Core temperature above threshold, cpu clock throttled (total events = 1)
2019-01-23T15:45:49,105187+01:00 CPU1: Core temperature above threshold, cpu clock throttled (total events = 1)
2019-01-23T15:45:49,105189+01:00 CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
2019-01-23T15:45:49,105192+01:00 CPU2: Package temperature above threshold, cpu clock throttled (total events = 1)
2019-01-23T15:45:49,105193+01:00 CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)
2019-01-23T15:45:49,105195+01:00 CPU3: Package temperature above threshold, cpu clock throttled (total events = 1)  << 
2019-01-23T15:45:49,106179+01:00 CPU3: Core temperature/speed normal                                                <<
2019-01-23T15:45:49,106179+01:00 CPU0: Package temperature/speed normal
2019-01-23T15:45:49,106180+01:00 CPU1: Core temperature/speed normal
2019-01-23T15:45:49,106181+01:00 CPU2: Package temperature/speed normal
2019-01-23T15:45:49,106181+01:00 CPU1: Package temperature/speed normal
2019-01-23T15:45:49,106182+01:00 CPU3: Package temperature/speed normal                                             <<

Now, the change is basically throttle it, and then it's instantly normal again. That is weird, right?

Comment 11 gbsalinetti 2019-02-06 19:39:14 UTC
Hi,

I noticed the same problem on my Lenovo P51 running Fedora 29 and kernel 4.20.5-200.fc29.x86_64.

# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Stepping:            9
CPU MHz:             800.056
CPU max MHz:         3800.0000
CPU min MHz:         800.0000
BogoMIPS:            5616.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            6144K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d


This is the output extracted from dmesg, and as stated above the throttling messages are immediately followed by a temperature/speed normal message.

[42581.573364] CPU0: Core temperature above threshold, cpu clock throttled (total events = 427)
[42581.573365] CPU4: Core temperature above threshold, cpu clock throttled (total events = 427)
[42581.573367] CPU3: Package temperature above threshold, cpu clock throttled (total events = 427)
[42581.573367] CPU7: Package temperature above threshold, cpu clock throttled (total events = 427)
[42581.573369] CPU1: Package temperature above threshold, cpu clock throttled (total events = 427)
[42581.573370] CPU5: Package temperature above threshold, cpu clock throttled (total events = 427)
[42581.573371] CPU2: Package temperature above threshold, cpu clock throttled (total events = 427)
[42581.573371] CPU6: Package temperature above threshold, cpu clock throttled (total events = 427)
[42581.573374] CPU4: Package temperature above threshold, cpu clock throttled (total events = 427)
[42581.573380] CPU0: Package temperature above threshold, cpu clock throttled (total events = 427)
[42581.581361] CPU0: Core temperature/speed normal
[42581.581362] CPU4: Core temperature/speed normal
[42581.581363] CPU5: Package temperature/speed normal
[42581.581364] CPU1: Package temperature/speed normal
[42581.581364] CPU7: Package temperature/speed normal
[42581.581365] CPU6: Package temperature/speed normal
[42581.581366] CPU2: Package temperature/speed normal
[42581.581367] CPU3: Package temperature/speed normal
[42581.581367] CPU4: Package temperature/speed normal
[42581.581368] CPU0: Package temperature/speed normal

Comment 12 Hans de Goede 2019-03-13 15:14:55 UTC
(In reply to Vít Ondruch from comment #6)
> @Hans, would you be so kind and could you please look into this? Because
> this completely destroys your flicker-free boot effort.

I'm sorry there is nothing I / Fedora can do here. This is a BIOS bug and the only advice I can give you is to complain to Lenovo, hopefully if enough people complain they will do something about this.

Comment 13 Vít Ondruch 2019-03-13 16:13:17 UTC
I just wonder how it happens that Windows seems to work without throttling while Linux has some issues. Does Windows use some workaround similar to [2] reference above?

Comment 14 Hans de Goede 2019-03-13 16:34:02 UTC
(In reply to Vít Ondruch from comment #13)
> I just wonder how it happens that Windows seems to work without throttling
> while Linux has some issues. Does Windows use some workaround similar to [2]
> reference above?

I wish we knew how this does work under Windows. If we knew we might be able to come up with a fix on the Linux side instead of waiting for a firmware fix, but that too requires cooperation from Lenovo.

Comment 15 Matthew Garrett 2019-03-13 19:46:30 UTC
Intel are the people who need to provide additional details here, this is due to the DPTF thermal framework that many ultrabook-style machines now use.

Comment 16 Vít Ondruch 2019-03-14 08:38:44 UTC
(In reply to Matthew Garrett from comment #15)
> Intel are the people who need to provide additional details here, this is
> due to the DPTF thermal framework that many ultrabook-style machines now use.

Interesting. Should I read it as that something like this could help?

https://github.com/intel/dptf

But since it is a daemon, not sure if it can run soon enough, because throttling starts right after systemd is executed, but probably sooner then any unit is loaded ...

Comment 17 Ben Cotton 2019-05-02 19:47:16 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 18 Martijn ten Heuvel 2019-05-04 14:10:25 UTC
(In reply to Matthew Garrett from comment #15)
> Intel are the people who need to provide additional details here, this is
> due to the DPTF thermal framework that many ultrabook-style machines now use.


Upgraded to f30, issue still the same.

[me@t460s ~]$ cat /etc/fedora-release 
Fedora release 30 (Thirty)
[mtenheuv@t460s ~]$ dmesg  | grep -i cpu | grep thro
[    1.578399] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 1)
[    1.578400] mce: CPU3: Core temperature above threshold, cpu clock throttled (total events = 1)
[    1.578402] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1)
[    1.578434] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)
[    1.578434] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1)
[    1.578518] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
[me@t460s ~]$ 

Weird this is unresolvable.

Comment 19 Vít Ondruch 2019-06-05 08:12:11 UTC
@Peter you were involved in bug 1480844 previously, do you think you could help direct this bug towards the right people at Lenovo?

Comment 20 Benjamin Berg 2019-06-05 08:50:54 UTC
There are a few pieces that are relevant here and we are working on making them available in Fedora.

More specifically:
 * thermald (available in Fedora), https://github.com/intel/thermal_daemon
 * dtpfxtract (will be available in rpmfusion soon), https://github.com/intel/dptfxtract

Unfortunately, I don't think that dptfxtract works well enough on the T470s. Also, even with a proper configuration I have seen the CPU being throttled consistently to a power usage of 15W (after an initial peak) which is the reported TDP. Windows on the same machine was able to draw a lot more power.


AFAICT the above threshold warnings are expected. The CPU package will draw a lot of power for a short period of time and the thermal capacity of the heatsink is small in modern laptops. So it will heat up really quickly and the CPU then takes protective actions by throttling.


By the way, there is "throttled" which is a hack adjusting certain CPU registers (https://github.com/erpalma/throttled). But I cannot say whether this is entirely safe to do so.

Comment 21 Vít Ondruch 2019-09-24 07:02:31 UTC
@Christian Do you have by a chance some contacts in Lenovo? I think fixing issues like this would improve the user experience.

Comment 24 Andy 2019-09-25 13:59:25 UTC
Lately there has been some movement on side of Lenovo!

According to the forum thread, they have been working on some updates, and started rolling out fixes the for 7th gen X1 Carbon. Other laptops are supposed to follow soon:

* T480/T480s 
* X1 Carbon 6th Gen 
* X1 Yoga 3rd Gen 
* P52/P52s/P53
* T470, T490, L380
* X1 extreme
* Thinkpad 25 Anniversary edition

[X1C6/T480s] low cTDP and trip temperature in Linux
https://forums.lenovo.com/t5/Other-Linux-Discussions/X1C6-T480s-low-cTDP-and-trip-temperature-in-Linux/td-p/4028489/page/11

They published a PDF with explanations of the fixes, which I'll attach here as well.

Thanks everybody!

Comment 25 Andy 2019-09-25 14:01:37 UTC
Created attachment 1619042 [details]
Linux Thermal Throttling explained by Lenovo

In the forum thread on the throttling issue Lenovo posted this document.

Comment 26 Erik M Jacobs 2020-02-04 17:13:58 UTC
I am curious about this as it appears it may also affect the T490s.

It looks like firmware has been released for various devices, however, here's what I see:

[thoraxe:~/Downloads] master* 1 ± sudo fwupdmgr get-approved-firmware 
[sudo] password for thoraxe: 
There is no approved firmware.
[thoraxe:~/Downloads] master* ± sudo fwupdmgr get-updates
No updatable devices
[thoraxe:~/Downloads] master* 2 ± sudo fwupdmgr get-devices 
20NYS7K90F
│
├─Thunderbolt Controller:
│     Device ID:           85cd0f2da1ec523f67160e52b2d10ab70b83161a
│     Summary:             Unmatched performance for high-speed I/O
│     Current version:     20.00
│     Vendor:              Lenovo (TBT:0x0109)
│     GUIDs:               e56d9729-1948-50ec-9a51-bd7448f55816 ← TBT-01091806
│                          6d1b64e3-ebb3-5481-9d55-f334e31f3332 ← TBT-01091806-0000:04:00.0
│     Device Flags:        • Internal device
│                          • Updatable
│                          • Requires AC power
│                          • Device stages updates
│   
├─System Firmware:
│     Device ID:           123fd4143619569d8ddb6ea47d1d3911eb5ef07a
│     Current version:     N2JET81W (1.59 )
│     Vendor:              LENOVO
│     Update Error:        Firmware can not be updated in legacy mode, switch to UEFI mode
│     GUID:                230c8b18-8d9b-53ec-838b-6cfc0383493a ← main-system-firmware
│     Device Flags:        • Internal device
│                          • Requires AC power
│                          • Needs a reboot after installation
│   
└─WDC PC SN730 SDBQNTY-256G-1001:
      Device ID:           f2759da7fe8e0388c5f3601cb072f837b1070b03
      Summary:             NVM Express Solid State Drive
      Current version:     11110101
      Vendor:              Sandisk Corp (NVME:0x15B7)
      Serial Number:       19385E801939
      GUIDs:               a39943dd-3afb-54f8-b110-c5a21f071200 ← NVME\VEN_15B7&DEV_5006&REV_00
                           fccbb6ea-e20e-58ad-bf8a-7fb7d43ff4c2 ← NVME\VEN_15B7&DEV_5006
                           1836f81c-3a3b-52b2-bb89-e5dc480ca9ec ← WDC PC SN730 SDBQNTY-256G-1001
      Device Flags:        • Internal device
                           • Updatable
                           • Requires AC power
                           • Needs a reboot after installation
                           • Device is usable for the duration of the update


It seems that there are some BIOS updates available for this device, but I'm not sure whether or not I should also see a *firmware* update via fwupdmgr ?

Comment 27 Pino Toscano 2020-02-05 09:50:46 UTC
(In reply to Erik M Jacobs from comment #26)
> [thoraxe:~/Downloads] master* 2 ± sudo fwupdmgr get-devices 
> [...]
> ├─System Firmware:
> │     Device ID:           123fd4143619569d8ddb6ea47d1d3911eb5ef07a
> │     Current version:     N2JET81W (1.59 )
> │     Vendor:              LENOVO
> │     Update Error:        Firmware can not be updated in legacy mode,
> switch to UEFI mode
> │     GUID:                230c8b18-8d9b-53ec-838b-6cfc0383493a ←
> main-system-firmware
> │     Device Flags:        • Internal device
> │                          • Requires AC power
> │                          • Needs a reboot after installation


> It seems that there are some BIOS updates available for this device, but I'm
> not sure whether or not I should also see a *firmware* update via fwupdmgr ?

There is a firmware update, however it can be installed only if the system is installed in UEFI mode.
Since your system is installed in legacy BIOS mode, it cannot be applied -- fwupd says that, see the above snippet of its output.

Comment 28 Erik M Jacobs 2020-02-05 14:27:45 UTC
Yeah I realized that shortly after I posted. Unfortunately "converting" to UEFI mode is a little more than I want to tackle due to the partitioning scheme I've got right now. So it'll wait for F32 to come out...

Comment 29 Andy 2020-02-12 22:17:00 UTC
I'm afraid that the only system that received a fix is the X1 Carbon Gen 7, and the release notes for the T490 BIOS updates don't mention any throttling fixes either.

So even if you update your BIOS to UEFI you probably won't have a fix for the throttling issue.

Redhat still certified the T470s and the T490s for RHEL 7, despite the throttling issue and the massive lack of performance that comes with it.

- https://access.redhat.com/ecosystem/hardware/2951231
- https://access.redhat.com/ecosystem/hardware/4000481

Comment 30 Vít Ondruch 2020-03-16 12:39:25 UTC
(In reply to Andy from comment #24)
> [X1C6/T480s] low cTDP and trip temperature in Linux
> https://forums.lenovo.com/t5/Other-Linux-Discussions/X1C6-T480s-low-cTDP-and-
> trip-temperature-in-Linux/td-p/4028489/page/11

It seems Lenovo restructured their web site. This should be the updated link to the same thread:

https://forums.lenovo.com/topic/view/27/4028489?page=6

Comment 31 Vít Ondruch 2020-03-16 13:15:06 UTC
And this seems to be document listing affected and fixed laptops:

https://docs.google.com/document/d/1MsqSYt0f_vU72pGGTWueeEgTBzd-HIz4ODHLmNOv16o/edit

Comment 32 Vitaly 2020-04-25 15:00:49 UTC
Temporary workaround:

#!/usr/bin/sh
set -e

echo 63BE270F-1C11-48FD-A6F7-3AF253FF3E2D > /sys/devices/platform/INT3400:00/uuids/current_uuid
echo enabled > /sys/class/thermal/thermal_zone1/mode

Comment 33 Erik M Jacobs 2020-04-28 14:23:21 UTC
Vitalie where did you get the 63B.... string from?

Does this work on all systems?

Comment 34 Erik M Jacobs 2020-04-28 14:24:53 UTC
[root@t490s-festive-local ~]# cat /sys/devices/platform/INT3400\:00/uuids/current_uuid 
INVALID
[root@t490s-festive-local ~]# cat /sys/devices/platform/INT3400\:00/uuids/available_uuids 
63BE270F-1C11-48FD-A6F7-3AF253FF3E2D
9E04115A-AE87-4D1C-9500-0F3E340BFE75

Comment 35 Vitaly 2020-04-28 14:56:27 UTC
> Vitalie where did you get the 63B.... string from?

From Linux kernel sources: https://github.com/torvalds/linux/blob/master/drivers/thermal/intel/int340x_thermal/int3400_thermal.c#L35

63BE270F-1C11-48FD-A6F7-3AF253FF3E2D is a GUID for THERMAL_ADAPTIVE_PERFORMANCE scheme.

Lenovo Intelligent Thermal Service on Windows set this GUID on boot. On modern Linux kernels it works fine too. No more throttling for me.

> Does this work on all systems?

Tested on T480, T580.

I created a small systemd-unit for myself: https://github.com/xvitaly/throttling-fix

Comment 36 Erik M Jacobs 2020-04-28 14:59:04 UTC
I've enabled your systemd unit. I have a T490. How would I validate that it's "fixed"?

Comment 37 Vitaly 2020-04-28 15:02:39 UTC
> I've enabled your systemd unit. I have a T490. How would I validate that it's "fixed"?

cat /sys/devices/platform/INT3400:00/uuids/available_uuids

It should return the correct GUID instead of INVALID value.

Comment 38 Vitaly 2020-04-28 15:03:06 UTC
Oops, the correct command is:

cat /sys/devices/platform/INT3400:00/uuids/current_uuid

Comment 39 Peter Robinson 2020-05-01 11:20:19 UTC
Matthew has been working on reverse engineering this, he posted a blog post here https://mjg59.dreamwidth.org/54923.html

Comment 40 Andy 2020-05-01 13:21:55 UTC
I realised that it has been a while that I saw the error "package temperature above threshold" on my T470s. 

So I queried my journal, and the last occurrence seemed to be February 22 for me. I've been gaming since quite a while, and even with my Windows 10 on Boxes I already saw my CPU up at 3600, unthrottled.

I did not apply any workarounds like thermald, and I have been running Fedora 31 with TLP, now Fedora 32. My BIOS version is 1.35 from August 2019.

Is it possible that some kernel update fixed the issue for my platform?

Comment 41 Vít Ondruch 2020-05-04 09:24:02 UTC
(In reply to Andy from comment #40)
This is interesting discovery. I have checked my journal and the last 'cpu clock throttled` message is from March 23rd, when I upgraded to kernel-5.5.10-200.fc31.x86_64. Not sure if the issue was fixed or just the message disabled ;) But it seems that my CPU can reach the CPU max.

(In reply to Peter Robinson from comment #39)
> Matthew has been working on reverse engineering this, he posted a blog post
> here https://mjg59.dreamwidth.org/54923.html

This was originally reported against T470s and I for one don't have the INT3400 available:

~~~
$ sudo ls /sys/devices/platform/ | grep INT3400
~~~

Comment 42 Christian Kellner 2020-05-04 09:36:06 UTC
(In reply to Vít Ondruch from comment #41)
> (In reply to Andy from comment #40)
> This is interesting discovery. I have checked my journal and the last 'cpu
> clock throttled` message is from March 23rd, when I upgraded to
> kernel-5.5.10-200.fc31.x86_64. Not sure if the issue was fixed or just the
> message disabled ;) But it seems that my CPU can reach the CPU max.

This is mostly due to our (Benjamin) and Intels work to not have critical messages for expected thermal events. See kernel commits 9c3bafaa1fd88e4dd2dba3735a1f1abb0f2c7bb7[1] and f6656208f04e5b3804054008eba4bf7170f4c841[2] 

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/cpu/mce/therm_throt.c?id=9c3bafaa1fd88e4dd2dba3735a1f1abb0f2c7bb7
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/cpu/mce/therm_throt.c?id=f6656208f04e5b3804054008eba4bf7170f4c841

Comment 43 Benjamin Berg 2020-05-04 09:44:26 UTC
See https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/VHQCIGJBKRDLTRRDKJBKCLE7BATFCBME/ for a good explanation of some of the issues involved.

Thermald is unlikely to be of high relevance on the Lenovo laptops in the future. The reverse engineering work might help on certain models.


Note You need to log in before you can comment on or make changes to this bug.