Bug 853179 - ondemand scaling governor cannot be activated on Atom D525. Computer overheating. (p4-clockmod vs. acpi-cpufreq)
ondemand scaling governor cannot be activated on Atom D525. Computer overheat...
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
17
i686 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-30 12:38 EDT by Daniel K
Modified: 2013-07-31 22:16 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-31 22:16:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Daniel K 2012-08-30 12:38:40 EDT
Description of problem:
Cannot activate ondemand scaling governor with Atom D525 chip.  Instead, netbook is always using performance scaling governor, even after attempts to set it manually to ondemand.  Netbook overheating seems to be due to this (have had to keep the lid off).  Research seems to indicate that this may be related to p4-clockmod being used for this CPU model instead of acpi-cpufreq.

Version-Release number of selected component (if applicable):
3.5.2-3.fc17.i686.PAE  (although it's been happening for quite a few releases and I only tried to figure it out now).

How reproducible:
Always.

Steps to Reproduce:
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
conservative userspace powersave ondemand performance 
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 
performance
# echo ondemand >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 
performance
# dmesg |tail -1
[ 1950.907273] ondemand governor failed, too long transition latency of HW, fallback to performance governor


  
Actual results:
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 
performance


Expected results:
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 
ondemand


Additional info:
From research on this issue, it seems that the Intel Atom D525 does not have SpeedStep technology and this leads to p4-clockmod being used. Because of the long transition latencies reported by p4-clockmod ("cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency" reports 10000001), ondemand doesn't allow itself to be enabled.  This is all well and good, except that it seems that acpi-cpufreq may be compatible with this chip and would probably allow for using ondemand.  (I have not been able to confirm this unfortunately because both p4-clockmod and acpi-cpufreq were built into the kernel instead of as modules, so it was impossible to blacklist p4-clockmod and modprobe acpi-cpufreq to test this.  I also attempted to build a kernel from source rpm with these two components built as modules, but I was not successful at getting it to boot.)
It seems that the following needs to be done:
1) Confirm that acpi-cpufreq would allow ondemand scaling governor on Atom D525s, and would not be worse than p4-clockmod for any other features/aspects.
2) Either patch the p4-clockmod or acpi-cpufreq module such that Atom D525s would select acpi-cpufreq over p4-clockmod if they're both available OR make these scaling governors into modules such that p4-clockmod can be manually disabled or blacklisted in some configurations.

# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 28
model name	: Intel(R) Atom(TM) CPU D525   @ 1.80GHz
stepping	: 10
microcode	: 0x107
cpu MHz		: 1800.000
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm
bogomips	: 3591.10
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 28
model name	: Intel(R) Atom(TM) CPU D525   @ 1.80GHz
stepping	: 10
microcode	: 0x107
cpu MHz		: 1800.000
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 2
apicid		: 2
initial apicid	: 2
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm
bogomips	: 3591.10
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 28
model name	: Intel(R) Atom(TM) CPU D525   @ 1.80GHz
stepping	: 10
microcode	: 0x107
cpu MHz		: 1800.000
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm
bogomips	: 3591.10
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 28
model name	: Intel(R) Atom(TM) CPU D525   @ 1.80GHz
stepping	: 10
microcode	: 0x107
cpu MHz		: 1800.000
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 2
apicid		: 3
initial apicid	: 3
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm
bogomips	: 3591.10
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:




Here are additional links of background information and possibly related bugs:
http://notes.benv.junerules.com/tag/p4-clockmod/
https://bbs.archlinux.org/viewtopic.php?id=101464
http://www.codon.org.uk/~mjg59/power/good_practices.html
https://bugs.gentoo.org/show_bug.cgi?id=287463
https://bugzilla.redhat.com/show_bug.cgi?id=474499
https://bugzilla.redhat.com/show_bug.cgi?id=697273
https://bugzilla.redhat.com/show_bug.cgi?id=832179
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/355232
Comment 1 Dave Jones 2012-11-01 15:04:25 EDT
the flags line of your cpu doesn't list 'est'. Without this, acpi-cpufreq won't do anything.

having to rely on p4-clockmod just to not overheat signifies a serious problem.
It shouldn't be necessary at all, and when thermal events occur, the ACPI layer itself should be throttling.
Comment 2 Daniel K 2012-11-01 19:36:37 EDT
Thanks for the response.  Is there anything you would like me to provide, investigate, or test?

Also, as an example, my overheating computer (which is a *nettop*, not *netbook* as I had described) is presently at 73.5 C with the lid off and pretty much idle.  This seems high to me, as my actual netbook regularly is around 43 C even when I'm working heavily on it.
Comment 3 Dave Jones 2012-11-01 23:31:38 EDT
contents of /sys/class/thermal/thermal_zone0/ would probably be a good place to start.

Might be worth checking if there's a BIOS update available too.
Comment 4 Daniel K 2012-11-02 00:16:33 EDT
No such directory as thermal_zone0.  The following is all I have.

# find /sys/class |grep thermal
/sys/class/thermal
/sys/class/thermal/cooling_device0
/sys/class/thermal/cooling_device1
/sys/class/thermal/cooling_device2
/sys/class/thermal/cooling_device3

As for the BIOS update, the two updated BIOSes (A02 and A03) do not seem to indicate any fix related to this issue.  The website is here:
http://www.jetwaycomputer.com/ITX-JBC600C99-52W.html
I'm not even sure how I'd go about flashing the BIOS, as their download seems to be Windows-based and uses some executable AFUD431.EXE .
Comment 5 Daniel K 2012-11-15 12:47:03 EST
The automatic shutdowns due to overheating have been becoming increasingly frequent now that it's winter and my heat is on.  If there's anything I can do in terms of providing information to help speed this along, please let me know.  Another point I'd like to bring up is that I never had this problem in Fedora 11, so perhaps something notable has changed between then and now.  I did a fresh install without the in-between versions of Fedora, so unfortunately I can't tell you which version started having the problem.
Comment 6 Daniel K 2012-12-09 23:58:40 EST
May I request that the Severity on this bug be raised?  I have a harddrive which smartctl says has only 122 hours LifeTime on it and has just now reported Current_Pending_Sectors.  I am almost certain it is due to the heat caused by this bug... I've done what I can in my room to open windows and turn off radiators--it is now rather cool--but the computer's normal operating temperature remains about 77-83 deg C.  Could this bug please be escalated as it now seems it has a fair chance of destroying actual data-containing hardware?  

I hope that if this bug cannot be resolved in a forward approach, that this can at least be considered a substantial regression (older Fedoras never caused this problem on the same hardware).  I'm the code which causes "fallback to performance governor" is the true culprit.  Here's another example from my /var/log/messages:

Dec  9 15:04:06 jetway kernel: [    1.084949] ondemand governor failed, too long transition latency of HW, fallback to performance governor

At this point I do not care about transition latencies *AT ALL*, I just want my computer to not be able to fry an egg (or my harddrive)...


Appreciated,

Dan
Comment 7 Fedora End Of Life 2013-07-03 19:31:05 EDT
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 8 Fedora End Of Life 2013-07-31 22:16:58 EDT
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.