Created attachment 678514 [details] lspci -vv Description of problem: When I run kernel-3.7.2, after suspend from resume the fan runs high non-stop. This is an improvement over kernel-3.6.10 where there was a 50% chance of a kernel panic. Version-Release number of selected component (if applicable): kernel-3.7.2-201.fc18 How reproducible: Always Steps to Reproduce: 1. Boot with 3.7.2 2. Suspend 3. Resume Actual results: Fan is running high constantly Expected results: Fan should run only when necessary to regulate temperature. Additional info: This sounds similar to 873027 but that targets radeon chip sets.
Created attachment 678515 [details] sensors
Created attachment 678516 [details] /sys/class/thermal/*/*
http://lkml.indiana.edu/hypermail/linux/kernel/1212.0/01262.html This mailing list discussion also sees the behaviour start with 3.7.0. lmsensors doesn't seem to know about my fans. I'm handy with code and the command-line, so please direct in any way I can help resolve this and I can figure stuff out. :)
After investigating a bit, I found that I can write to /sys/class/thermal/cooling_device*/cur_state. I've tested setting it to 0 for the fans after resuming from suspend and that causes them to spin down. I've been running for about half an hour since, and I see that at least cooling_device3 (a fan) will spin back up when it gets warm enough. As long as all fans are willing to spin again, a temporary solution for me can be to just reset cur_state, I suppose, unless someone knows if that's a bad idea.
(In reply to comment #4) > After investigating a bit, I found that I can write to > /sys/class/thermal/cooling_device*/cur_state. I've tested setting it to 0 > for the fans after resuming from suspend and that causes them to spin down. same behaviour on my hp laptop: $ dmesg | grep Hewlett-Packard DMI: Hewlett-Packard HP Compaq nx9420 (RU483EA#AKR)/309F, BIOS 68YAF Ver. F.1D 07/11/2008 running F17 with latest kernel 3.7.3-101.fc17.x86_64 sensors are similar to yours, but my "100°C" is at temp6: $ sensors acpitz-virtual-0 Adapter: Virtual device temp1: +44.0°C (crit = +256.0°C) temp2: +44.0°C (crit = +102.0°C) temp3: +57.0°C (crit = +110.0°C) temp4: +43.0°C (crit = +105.0°C) temp5: +32.7°C (crit = +102.0°C) temp6: +100.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +46.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +46.0°C (high = +100.0°C, crit = +100.0°C) while echoing "0" to cooling_device[0-8]/cur_state I get temp6 down to 70°C, then to 60°C, 50°C, 40°C and finally to 25°C and fan spins down accordingly. > I've been running for about half an hour since, and I see that at > least cooling_device3 (a fan) will spin back up when it gets warm > enough. Problem is that while running some cpu intense test (2 instances of "openssl speed"), the CPU temp rise to temp1: +68.0°C (crit = +256.0°C) temp2: +72.0°C (crit = +102.0°C) temp3: +60.0°C (crit = +110.0°C) temp4: +53.0°C (crit = +105.0°C) temp5: +33.1°C (crit = +102.0°C) temp6: +40.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +72.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +73.0°C (high = +100.0°C, crit = +100.0°C) but the fans are still off. Funny thing is that after echoing "1" to cooling_device[01]/cur_state the temp6 grows to +100°C, the fans starts spin like mad, but the core[01] temp stays at 68-72°C.
(In reply to comment #4) > After investigating a bit, I found that I can write to > /sys/class/thermal/cooling_device*/cur_state. I've tested setting it to 0 > for the fans after resuming from suspend and that causes them to spin down. > I've been running for about half an hour since, and I see that at least > cooling_device3 (a fan) will spin back up when it gets warm enough. Similar behavior on my laptop Hewlett-Packard HP 550/3618, BIOS 68MVU Ver. F.05 01/19/2009 F18 with kernel 3.7.2-204.fc18.x86_64 #1 SMP Wed Jan 16 16:22:52 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux After resume the fan runs high, when system is idle. Sensors after resume: acpitz-virtual-0 Adapter: Virtual device temp1: +39.0°C (crit = +105.0°C) temp2: +23.1°C (crit = +108.0°C) temp3: +100.0°C (crit = +110.0°C) temp4: +45.0°C (crit = +256.0°C) temp5: +36.0°C (crit = +108.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +37.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +37.0°C (high = +100.0°C, crit = +100.0°C) cat /sys/class/thermal/cooling_device*/cur_state after resume /sys/class/thermal/cooling_device0/cur_state 1 /sys/class/thermal/cooling_device1/cur_state 1 /sys/class/thermal/cooling_device2/cur_state 1 /sys/class/thermal/cooling_device3/cur_state 0 /sys/class/thermal/cooling_device4/cur_state 1 /sys/class/thermal/cooling_device5/cur_state 0 /sys/class/thermal/cooling_device6/cur_state 0 /sys/class/thermal/cooling_device7/cur_state 0 After echo "0" > /sys/class/thermal/cooling_device0/cur_state temp3 remains at 100°C, fan spins rapidly After echo "0" > /sys/class/thermal/cooling_device1/cur_state temp3 shows 65°C and fan spins slower After echo "0" > /sys/class/thermal/cooling_device2/cur_state temp3 shows 0°C when idle and fan spins down, but after cpu load temp3 shows corresponding temp and fan spins accordingly.
Created attachment 688886 [details] spin fan down by zeroing cooling device state maybe my machine is dying, but today I have to extend zeroing to /sys/class/thermal/cooling_device9/cur_state here is the script ftw. after zeroing, the temp6 and fan works like expected, it goes up and down according to cpu temp measured by temp1/2 and core* I am curious if it's possible to hook this script to some after_suspend script to work automagically...
This happens also on Hewlett-Packard HP Compaq 6710b (GR685EA#AKR)/30C0, BIOS 68DDU Ver. F.10 01/11/2008 runnning 3.7.6-102.fc17.x86_64 kernel. sensors output: # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +41.0°C (crit = +105.0°C) temp2: +31.4°C (crit = +108.0°C) temp3: +100.0°C (crit = +110.0°C) temp4: +45.0°C (crit = +256.0°C) temp5: +42.0°C (crit = +108.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +44.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +44.0°C (high = +100.0°C, crit = +100.0°C)
Hi all, Same problem on "Hewlett-Packard HP Compaq nc6400 (EH522AV)/30AD, BIOS 68YCU Ver. F.0B 09/05/2007" on x86 platform. As stated above, the issue can be fixed with 0 in /sys/class/thermal/cooling_device0 and 1 and 2/cur_state after each resume. I have this issue since the new install of F18, but never on my previous F16. The current kernel is 3.7.9-201. I am far from being an expert, is that coming modifications in the kernel or fedora architecture around ?
(In reply to comment #9) > Hi all, > Same problem on "Hewlett-Packard HP Compaq nc6400 (EH522AV)/30AD, BIOS 68YCU > Ver. F.0B 09/05/2007" on x86 platform. > As stated above, the issue can be fixed with 0 in > /sys/class/thermal/cooling_device0 and 1 and 2/cur_state after each resume. > > I have this issue since the new install of F18, but never on my previous > F16. > The current kernel is 3.7.9-201. > I am far from being an expert, is that coming modifications in the kernel or > fedora architecture around ? I can confirm that this works, but worry that this is not the best solution to the problem. System Information Manufacturer: Hewlett-Packard Product Name: HP Compaq nc6400 (RM100AW#ABA) Version: F.08 Wake-up Type: Power Switch Family: 103C_5336AN Is there a way to execute this every time after suspend?
Here's my workaround and ugly-hack that fixes the fan-issue in HP 2510p running Fedora 18: Make a script: /usr/lib/systemd/system-sleep/cooldown.sh #!/bin/sh # Stop fan in HP2510p after resuming from sleep /bin/echo "0" >/sys/class/thermal/cooling_device0/cur_state /bin/echo "0" >/sys/class/thermal/cooling_device1/cur_state /bin/echo "0" >/sys/class/thermal/cooling_device2/cur_state /bin/echo "0" >/sys/class/thermal/cooling_device3/cur_state /bin/echo "0" >/sys/class/thermal/cooling_device4/cur_state /bin/echo "0" >/sys/class/thermal/cooling_device5/cur_state /bin/echo "0" >/sys/class/thermal/cooling_device6/cur_state exit 0 Make it executable: chmod 755 cooldown.sh In other computer models it may be necessary to experiment first which cooling_device-numbers are needed in the script. I tried first /etc/pm/sleep.d/ but later found out that in Fedora 18 the way sleep states are handled has been changed. There is something like systemd daemon that executes scripts in /usr/lib/systemd/system-sleep/
(In reply to comment #11) > Here's my workaround and ugly-hack that fixes the fan-issue in HP 2510p > running Fedora 18: > > Make a script: > /usr/lib/systemd/system-sleep/cooldown.sh > > #!/bin/sh > # Stop fan in HP2510p after resuming from sleep > /bin/echo "0" >/sys/class/thermal/cooling_device0/cur_state > /bin/echo "0" >/sys/class/thermal/cooling_device1/cur_state > /bin/echo "0" >/sys/class/thermal/cooling_device2/cur_state > /bin/echo "0" >/sys/class/thermal/cooling_device3/cur_state > /bin/echo "0" >/sys/class/thermal/cooling_device4/cur_state > /bin/echo "0" >/sys/class/thermal/cooling_device5/cur_state > /bin/echo "0" >/sys/class/thermal/cooling_device6/cur_state > exit 0 > > Make it executable: chmod 755 cooldown.sh > > > In other computer models it may be necessary to experiment first which > cooling_device-numbers are needed in the script. > > I tried first /etc/pm/sleep.d/ but later found out that in Fedora 18 the way > sleep states are handled has been changed. There is something like systemd > daemon that executes scripts in /usr/lib/systemd/system-sleep/ Thanks a lot! This workaround works perfectly on my HP550 with F18. There is sufficient to do: echo "0" >/sys/class/thermal/cooling_device0/cur_state echo "0" >/sys/class/thermal/cooling_device1/cur_state echo "0" >/sys/class/thermal/cooling_device2/cur_state I tried first to put similar script to /lib64/pm-utils/sleep.d/, but it didn't work too.
This is still an issue with 3.8.1-201.fc18.x86_64 on a HP Compaq 6730b.
Still an issue on HP Probook 4515s (AMD RS880 chipset). The echo 0 > script helps.
Sorry. I mean that this is still an issue with latest kernel (3.8.2)
Same here. Hewlett Packard as well, model 6720s. Right after the suspend I had acpitz-virtual-0 Adapter: Virtual device temp1: +37.0°C (crit = +105.0°C) temp2: +24.8°C (crit = +108.0°C) temp3: +100.0°C (crit = +110.0°C) temp4: +45.0°C (crit = +256.0°C) temp5: +42.0°C (crit = +108.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +45.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +45.0°C (high = +100.0°C, crit = +100.0°C) echo "0" script made temp3 go to zero temp3: +0.0°C (crit = +110.0°C) Makes me uncomfortable, though, as I have to check the temperatures by hand.
This issue has been reported on LKML in https://lkml.org/lkml/2012/12/4/428 quote: "Note that, currently, the cooling device is set to the deepest cooling state required." - maybe that's the problem? On my HP Compaq 2510p the temp6 seems to report what I've chosen to believe is the percentage of max fan speed (instead of RPM), as I was only able to reach the max value of 100. After echoing 1's to only one cooling device at a time (setting 0 to previous one) I came up with these values coolingdevice path temp6_value Fan 0: \_TZ_.C3B1 100 Fan 1: \_TZ_.C3B2 70 Fan 2: \_TZ_.C3C8 100 Fan 3: \_TZ_.C3C9 90 Fan 4: \_TZ_.C3CA 70 Fan 5: \_TZ_.C3CB 50 Fan 6: \_TZ_.C3CC 30 the 'path' value is what's in /sys/devices/virtual/thermal/cooling_deviceX/device/path file. For temp6 it's \_TZ_.TZ5_ On properly working 3.6.11 kernel, coolingdevices 0 and 1 are never used, even under 100% cpu load with laptop stuffed under a pillow - only 2-6 are used.
automagic workaround for f17 (may work on other redhat based distros): make worker script /bin/fan.sh #!/bin/bash # fan spindown for hp nx9420 # @see https://bugzilla.redhat.com/show_bug.cgi?id=895276#c18 cd /sys/class/thermal/ echo 0 > cooling_device0/cur_state echo 0 > cooling_device1/cur_state echo 0 > cooling_device2/cur_state echo 0 > cooling_device3/cur_state echo 0 > cooling_device4/cur_state echo 0 > cooling_device5/cur_state echo 0 > cooling_device6/cur_state echo 0 > cooling_device7/cur_state echo 0 > cooling_device8/cur_state # optional # echo 0 > cooling_device9/cur_state make controller script /etc/pm/sleep.d/99_fan #!/bin/sh # fan spindown controller script # @see https://bugzilla.redhat.com/show_bug.cgi?id=895276#c18 case "$1" in hibernate|suspend) ;; thaw|resume) /bin/fan.sh ;; *) exit $NA ;; esac make both scripts runnable: chmod +x /bin/fan.sh chmod +x /etc/pm/sleep.d/99_fan When computer goes from sleep the power management runs all /etc/pm/sleep.d/ scripts. The fan will spin down. The script is run as root so we don't need to make sudo magic here. The spindown takes few seconds so one can hopefully hear when this bug is fixed and the fan won't go crazy right after resume.
Andrej, not work in fedora18 on hp6730b and kernel 3.8.4-202 PAE:(
@prohol: "not work" means a) the script is not run upon resume from sleep: I've got no f18 machine here, maybe there is no /etc/pm/sleep.d/ at all. b) the fan.sh is not silencing the fans: try to look how much /sys/class/thermal/cooling_device* files you have and fix the fan.sh to match.
@Andrej I think that after resume script fan.sh not running. When I run fan.sh manually it works, so /etc/pm/sleep.d/99_fan not work after resume. Both scripts are runnable. How to debug resume to chech why 99_fan not work? Thanks.
Same problem on HP Mini 5103 kernel 3.8.3-103.fc17.i686. Workaroud works, but what is the
@prohol: according to this http://lists.fedoraproject.org/pipermail/test/2013-January/113578.html it seems that there is something broken in f18's power manager.
Andrej: This issue is also present in later updates of F17, so I think that is a different problem. Could perhaps be something with the scripts not being run, but it's not the switch logind, probably.
@simon: I've got fully updated F17 here (ok, the machine was not rebooted for a while so I am running 3.7.9-104.fc17.x86_64 kernel) but the pm script is run automagically every time the machine goes from sleep. I am using it because my HP nx9420 machine is affected by this 895276 bug.
Affected also HP Compaq 6910p Linux gnome3 3.8.4-102.fc17.x86_64 /etc/pm/sleep.d/99_fan workaround seems ok. (modified it like this, though: ... thaw|resume) for x in /sys/class/thermal/cooling_device*/cur_state; do echo 0 > $x done ...)
the same error on a Compaq NC 4200, Fedora 8 the following script fixed the bug for me: /lib/systemd/system-sleep/fanfix.sh #!/bin/sh case $1 in post) for x in /sys/class/thermal/cooling_device*/cur_state; do echo 0 > $x done ;; esac
Sorry, meant Fedora 18. The problem also occurs with Bodhi Linux, Kernel 3.8.0-12-generic (Ubuntu). Seems to be related to the kernel. Thank you for your hints helped me a lot.
Vanilla 3.9.1 kernel is still affected. I hoped commit 0252cb3cc34d02ffb9ff835488a805030d3ef435 would finally fix it (as it fixes more or less similar problem), but it didn't. And it surely is kernel related, I'm using Slackware.
Micha or Mikhel resolution not work in F18 on hp 6730b - kernel 3.8.8-202.fc18.i686.PAE
@prohol: Have you checked the script executable? What happens if you start it manually after resume?
I see this problem on the Samsung 550 Chromebook (on the Pixel we either have the EC auto-set fan speed, or have user-space communicate directly with the EC to control fan speeds) It looks like this was introduced somewhere in this merge: Merge: 125c4c7 c072fed Author: Len Brown <len.brown> AuthorDate: Tue Oct 9 01:35:52 2012 -0400 Commit: Len Brown <len.brown> CommitDate: Tue Oct 9 01:35:52 2012 -0400 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux into thermal Conflicts: drivers/staging/omap-thermal/omap-thermal-common. OMAP supplied dummy TC1 and TC2, at the same time that the thermal tree removed them from thermal_zone_device_register() drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c propogate the upstream MAX_IDR_LEVEL re-name to prevent a build failure Previously-fixed-by: Stephen Rothwell <sfr.org.au> Signed-off-by: Len Brown <len.brown> but unfortunately, those patches within that merge branch aren't completing a suspend-resume for me, so I can't bisect down to a specific patch, but it's probably one of the ones to the generic thermal framework. I will try to rebase those patches later and see which one is the culprit later.
If this can be reproduced with an upstream kernel, then it would be ideal to open a bug at bugzilla.kernel.org against ACPI/power-fan and reference this report from there for context. If the issue is easily reproduced, then the chances of a prompt solution are high.
please check if this commit fixes the problem for you. commit 94a409319561ec1847fd9bf996a2d5843ad00932 Author: Zhang Rui <rui.zhang> Date: Fri Apr 26 09:19:53 2013 +0000 ACPI / thermal: do not always return THERMAL_TREND_RAISING for active trip points Commit 4ae46be "Thermal: Introduce thermal_zone_trip_update()" introduced a regression causing the fan to be always on even when the system is idle. My original idea in that commit is that: - when the current temperature is above the trip point, keep the fan on, even if the temperature is dropping. - when the current temperature is below the trip point, turn on the fan when the temperature is raising, turn off the fan when the temperature is dropping. But this is what the code actually does: - when the current temperature is above the trip point, the fan keeps on. - when the current temperature is below the trip point, the fan is always on because thermal_get_trend() in driver/acpi/thermal.c returns THERMAL_TREND_RAISING. Thus the fan keeps running even if the system is idle. Fix this in drivers/acpi/thermal.c. [rjw: Changelog] References: https://bugzilla.kernel.org/show_bug.cgi?id=56591 References: https://bugzilla.kernel.org/show_bug.cgi?id=56601 References: https://bugzilla.kernel.org/show_bug.cgi?id=50041#c45 Signed-off-by: Zhang Rui <rui.zhang> Tested-by: Matthias <morpheusxyz123> Tested-by: Ville Syrjälä <syrjala> Cc: 3.7+ <stable.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki>
I tried that patch, which I believe is in 3.8.13(In reply to comment #34) and it doesn't seem to help. I'll also open a kernel.org bugzilla entry. > please check if this commit fixes the problem for you. > > commit 94a409319561ec1847fd9bf996a2d5843ad00932 > Author: Zhang Rui <rui.zhang> > Date: Fri Apr 26 09:19:53 2013 +0000 > > ACPI / thermal: do not always return THERMAL_TREND_RAISING for active > trip points > > Commit 4ae46be "Thermal: Introduce thermal_zone_trip_update()" > introduced a regression causing the fan to be always on even when > the system is idle. > > My original idea in that commit is that: > - when the current temperature is above the trip point, > keep the fan on, even if the temperature is dropping. > - when the current temperature is below the trip point, > turn on the fan when the temperature is raising, > turn off the fan when the temperature is dropping. > > But this is what the code actually does: > - when the current temperature is above the trip point, > the fan keeps on. > - when the current temperature is below the trip point, > the fan is always on because thermal_get_trend() > in driver/acpi/thermal.c returns THERMAL_TREND_RAISING. > Thus the fan keeps running even if the system is idle. > > Fix this in drivers/acpi/thermal.c. > > [rjw: Changelog] > References: https://bugzilla.kernel.org/show_bug.cgi?id=56591 > References: https://bugzilla.kernel.org/show_bug.cgi?id=56601 > References: https://bugzilla.kernel.org/show_bug.cgi?id=50041#c45 > Signed-off-by: Zhang Rui <rui.zhang> > Tested-by: Matthias <morpheusxyz123> > Tested-by: Ville Syrjälä <syrjala> > Cc: 3.7+ <stable.org> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki>
Opened another bug on kernel.org bug tracker here: https://bugzilla.kernel.org/show_bug.cgi?id=58311
3.9.2-200.fc18.x86_64 (and maybe some earlier ones too) fixes this for me.
Is this fixed with 3.9.y kernels for anyone else?
A quick test suggests resuming from suspend is fine here. =D
OK, thanks for confirming. I'm going to close this out for now. If people are still seeing this issue with identical hardware, please reopen. If the hardware is different, please open a new bug.