Bug 1360705

Summary: [abrt] WARNING: CPU: 0 PID: 5075 at drivers/cpufreq/cpufreq.c:2173 cpufreq_update_policy+0x102/0x150
Product: [Fedora] Fedora Reporter: Alan Jenkins <alan.christopher.jenkins>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 23CC: arapov, gansalmon, itamar, jonathan, kernel-maint, lpancescu, madhu.chinakonda, mchehab, sergio
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/74da2a292d35269de81f8a1d7c8830bb8d9ea029
Whiteboard: abrt_hash:ee8c1895b19ab952d4e5627d8ad59bb71022c87a;VARIANT_ID=workstation;
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-20 04:36:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
File: dmesg none

Description Alan Jenkins 2016-07-27 11:19:55 UTC
Description of problem:
This happens during resume from sleep.

It happened at least twice since an upgrade two days ago to kernel 4.6.4-201.fc23.x86_64

Apart from the warnings (several per resume?), I don't see anything go wrong.  (I made this report before rebooting :).

CONFESSION: The multiple backtraces appear to confuse ABRT. The first backtrace is untainted, but obviouslly later ones are not. ABRT refused to let me report the bug.  Therefore I hacked the report - just removing the taint files.  I haven't touched dmesg - it still shows backtraces after the first as tainted.

Additional info:
reporter:       libreport-2.6.4
WARNING: CPU: 0 PID: 5075 at drivers/cpufreq/cpufreq.c:2173 cpufreq_update_policy+0x102/0x150
Modules linked in: vfat fat fuse ccm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_security ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_filter ip6_tables iptable_security iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw coretemp kvm_intel kvm iTCO_wdt arc4 iTCO_vendor_support mei_wdt iwldvm irqbypass mac80211 crct10dif_pclmul crc32_pclmul crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel snd_hda_codec_conexant snd_hda_codec_generic iwlwifi uvcvideo snd_hda_intel snd_hda_codec videobuf2_vmalloc
 snd_hda_core cfg80211 videobuf2_memops videobuf2_v4l2 videobuf2_core snd_hwdep snd_seq videodev joydev snd_seq_device media snd_pcm intel_ips i2c_i801 lpc_ich mei_me snd_timer mei shpchp acpi_cpufreq thinkpad_acpi wmi snd soundcore rfkill tpm_tis tpm i915 i2c_algo_bit drm_kms_helper serio_raw ums_realtek uas usb_storage drm e1000e ptp pps_core fjes video
CPU: 0 PID: 5075 Comm: kworker/0:2 Tainted: G        W       4.6.4-201.fc23.x86_64 #1
Hardware name: LENOVO 3680FZ2/3680FZ2, BIOS 6QET70WW (1.40 ) 10/11/2012
Workqueue: kacpi_notify acpi_os_execute_deferred
 0000000000000286 0000000051767350 ffff880094d37b98 ffffffff813d90de
 0000000000000000 0000000000000000 ffff880094d37bd8 ffffffff810a732b
 0000087d00000001 ffff880131846000 00000000fffffffb ffff880131846110
Call Trace:
 [<ffffffff813d90de>] dump_stack+0x63/0x85
 [<ffffffff810a732b>] __warn+0xcb/0xf0
 [<ffffffff810a745d>] warn_slowpath_null+0x1d/0x20
 [<ffffffff8166aaa2>] cpufreq_update_policy+0x102/0x150
 [<ffffffff8166aaf0>] ? cpufreq_update_policy+0x150/0x150
 [<ffffffff81496397>] acpi_processor_ppc_has_changed+0x74/0x7c
 [<ffffffff81492feb>] acpi_processor_notify+0x56/0xe1
 [<ffffffff81478297>] acpi_ev_notify_dispatch+0x44/0x5c
 [<ffffffff8146021e>] acpi_os_execute_deferred+0x14/0x20
 [<ffffffff810c0b3c>] process_one_work+0x15c/0x430
 [<ffffffff810c0e5e>] worker_thread+0x4e/0x480
 [<ffffffff817d5dbd>] ? __schedule+0x2ed/0x790
 [<ffffffff810c0e10>] ? process_one_work+0x430/0x430
 [<ffffffff810c0e10>] ? process_one_work+0x430/0x430
 [<ffffffff810c6c98>] kthread+0xd8/0xf0
 [<ffffffff810cf0fa>] ? finish_task_switch+0x7a/0x260
 [<ffffffff817da5c2>] ret_from_fork+0x22/0x40
 [<ffffffff810c6bc0>] ? kthread_worker_fn+0x170/0x170

Potential duplicate: bug 1357917

Comment 1 Alan Jenkins 2016-07-27 11:20:00 UTC
Created attachment 1184601 [details]
File: dmesg

Comment 2 Sergio Basto 2016-07-28 02:50:04 UTC
same here happens in all resumes from sleep , also found in ask a similar problem , I think, https://ask.fedoraproject.org/en/question/29705/how-to-fix-temperature-threshold-cpu-throttled-errors/?answer=91650#post-id-91650

Comment 3 Sergio Basto 2016-07-28 02:50:56 UTC

Comment 4 Sergio Basto 2016-07-30 15:07:51 UTC
After downgrade to microcode_ctl-2.1-10.fc23.x86_64 run "dracut --kver 4.6.5-200.fc23.x86_64 --force " for all kernels and reboot again , I was able to fix this kernels oops .

Comment 5 Alan Jenkins 2016-07-30 18:26:20 UTC
Suggested fix/workaround does not work for me.[1]

My working hypothesis was the WARNING happened due to recent cpufreq rewrite, and hence would not be related to microcode.  So it's quite frustrating to have the issue I reported re-assigned to microcode, without an explanation of why that was considered a candidate cause :-(.

Note the cpufreq changes would be consistent with it being in kernel version 4.6 and not 4.5.  See https://lwn.net/Articles/682391/

Note for me, I'm finding the WARNING does _not_ happen every suspend.  I have to follow the sequence below to get it.  _Then_ it triggers reliably, I believe.  (I have... some experience with reproducing annoying ACPI bugs :).  My hope is something similar (not necessarily exactly the same) led to you thinking microcode was responsible.  It would be great if you could think about your testing, and whether that's a possibility in your case.

- thinkpad running on AC power
- enter suspend *specifically* by closing the lid
- unplug from AC power
- re-open lid, triggering resume from suspend

[1] Downgraded to microcode_ctl-2:2.1-9.1.fc23.x86_64, ran "dracut --force" to regenerate current initrd, which mentions my current kernel version 4.6.4-201.fc23.x86_64.  Rebooted.  WARNING message still occurred.

Versions seems slightly different on Sergio machine - proposed-updates?

Comment 6 Sergio Basto 2016-07-30 18:47:40 UTC
To use the new microcode, initrd for the kernel to boot needs to be (re)created after the new microcode_ctl package is installed. This is done automatically by a new installed kernel package, or you can do it manually using dracut (see "man dracut"). For example, for the current running kernel, run

Comment 7 Alan Jenkins 2016-07-30 20:28:18 UTC
But that's what I did?

"Downgraded to microcode_ctl-2:2.1-9.1.fc23.x86_64, ran "dracut --force" to regenerate current initrd, which mentions my current kernel version 4.6.4-201.fc23.x86_64.  Rebooted.  WARNING message still occurred."

> For example, for the current running kernel, run

It looks like you were trying to suggest a specific command, but it got eaten somewhere?

Comment 8 Alan Jenkins 2016-07-30 20:30:32 UTC
Also my system reliably boots.  Symptoms like boot failure as per #1353103 are more like what I would have thought CPU microcode issues would do.

Comment 9 Sergio Basto 2016-07-30 21:00:12 UTC
(In reply to Alan Jenkins from comment #7)
> But that's what I did?
> "Downgraded to microcode_ctl-2:2.1-9.1.fc23.x86_64, ran "dracut --force" to
> regenerate current initrd, which mentions my current kernel version
> 4.6.4-201.fc23.x86_64.  Rebooted.  WARNING message still occurred."

no , Downgraded to microcode_ctl-2:2.1-9.1.fc23.x86_64 , reboot , to an old kernel preferably, that haven't this issues , after boot with old microcode_ctl run "dracut --force"

> > For example, for the current running kernel, run
> It looks like you were trying to suggest a specific command, but it got
> eaten somewhere?

Comment 10 Alan Jenkins 2016-07-30 21:35:15 UTC
Sorry.  I don't believe that's needed.

Please, can you write and/or copy-paste your own issue against microcode_ctl.

If a third party decides we have the same issue, that's not a problem.  Issues can always be marked as duplicates.

cpufreq was changed in kernel version 4.6.  WARNING messages from cpufreq starting in version kernel 4.6, are most likely caused by this.

CPU microcode issues are unusual, and expected to have much more impact (like complete failure during boot).

Comment 11 Sergio Basto 2016-07-30 21:45:35 UTC
Well https://paste.fedoraproject.org/396654/69672341/ line 1001 [1] doesn't happens to me anymore , you have several bugs opened , TBH now I'm using kernel  4.6.5-200.fc23.x86_64 and not 4.6.4 ... 

[  836.906123] WARNING: CPU: 0 PID: 53 at drivers/cpufreq/cpufreq.c:2173 cpufreq_update_policy+0x102/0x150

Comment 12 Alan Jenkins 2016-07-30 22:00:46 UTC
> you have several bugs opened

I'm still struggling to understand you.  But I guess I'm in the process of upgrading to Fedora 24.  ABRT shows us a separate bug for Fedora 24, which is still assigned to the kernel package.

So actually I shouldn't mind, if you switch this bug back again to the microcode_ctl package.  (It will at least make the assignee consistent again - my bad).

I can subscribe to the Fedora 24 bug.  The reason I wanted to report this to the kernel team, was that ABRT was failing to report it automatically.  It looks like ABRT has been reporting it correctly for Fedora 24, so I can be confident the message will get through to the kernel developer(s) who are rewriting cpufreq.

Comment 13 Sergio Basto 2016-07-30 22:32:46 UTC
(In reply to Alan Jenkins from comment #12)
> > you have several bugs opened

https://bugzilla.redhat.com/show_bug.cgi?id=1353103 and 
1351943 1352700 1353061 1353586 1357317 1357862 and 1361183 and this one at least
> I'm still struggling to understand you.  

Sorry , for my weak English, what I can say ? , after downgrade microcode_ctl , I boot with kernel-4.5.7-202.fc23.x86_64 , I update initramfs and this problem has gone is what I can tell you, if kernel-4.6.5 makes difference to 4.6.4, etc I don't know, I just follow the instructions ( https://www.happyassassin.net/2016/07/07/psa-failure-to-boot-after-kernel-update-on-skylake-systems/ )

The bug is yours, I'm not going to change it. 
I lost more than one day with very stupid problems, just after regenerate initramfs, never understood well what happened, but now, that things are running well, I will not do more tests. 

This was one hell of week, they update in F23 ! kernel, microcode_ctl and kde plasma 5.7.1 all buggy ! , without fully test in F24 or rawhide, that is what makes me mad and also I'm tired . 

Sorry for any inconvenience .

Comment 14 Sergio Basto 2016-07-31 05:30:11 UTC
I speak to early (even with microcode_ctl-2.1-10), on second resume from sleep I got again  : 
[12530.888028] WARNING: CPU: 0 PID: 15865 at drivers/cpufreq/cpufreq.c:2173 cpufreq_update_policy+0x102/0x150

:/ , so it is kernel your are right.

Comment 15 Laurentiu Pancescu 2016-08-09 11:33:56 UTC
*** Bug 1364983 has been marked as a duplicate of this bug. ***

Comment 16 Sergio Basto 2016-08-09 22:49:27 UTC
kernel 4.7.0-300.fc24 from kernel packages maintainers [1] fix this issue for me. 

[1] https://copr.fedorainfracloud.org/coprs/jforbes/kernel-stabilization/

Comment 17 Laurentiu Pancescu 2016-08-25 08:01:28 UTC
This used to happen on every single wakeup with kernels 4.6.3 to 4.6.5; since updating to kernel 4.6.6-200.fc23, I counted 26 wakeups in the logs, without any WARNING at all.  I upgraded yesterday to kernel 4.6.7-200.fc23: 4 wakeups without any issues.  I assume the problem was fixed?

Comment 18 Alan Jenkins 2016-08-25 08:09:34 UTC
Neat, it looks like the fix for cpufreq_update_policy() got into 4.6.6-stable.


ABRT hasn't been bugging me either lately.  I think this bug could be closed.

Comment 19 Sergio Basto 2016-09-20 04:36:47 UTC
yes, I think so , closing this bug.
FYI I'm running kernels from "official" repo "Rawhide kernels built without debugging turned on" and since kernel-4.8.0-0.rc6.git0.1.fc26.x86_64 performance of laptop, especially of thermal and fans etc looks great again on my DELL Latitude E6410 ...