Bug 197174 - Dual processor G5 fan control not clever enough
Summary: Dual processor G5 fan control not clever enough
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 5
Hardware: ppc64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-06-29 02:06 UTC by Dennis Brylow
Modified: 2009-06-24 13:14 UTC (History)
2 users (show)

Fixed In Version: 2.6.18-1.2200.fc5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-10-21 02:08:33 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Dennis Brylow 2006-06-29 02:06:48 UTC
Description of problem:
Fresh install of FC5 on Dual processor 2.3GHz PowerMac G5.  Fans idle normally,
ramp up accordingly when machine is under light load.  As soon as any real load
comes along, fans ramp up to maximum, kernel emits a message about the
CPU being "way over" temperature, and shuts down automatically.

Version-Release number of selected component (if applicable):
kernel 2.6.15-1.2054_FC5
and
kernel 2.6.17-1.2139_FC5 (latest update)

How reproducible:
Always.

Steps to Reproduce:
1. Install FC5 on Dual Processor 2.3GHz G5
2. Run anything sizeable, like "yum update"
3. Count to 20.
  
Actual results:
Fans rev up to maximum, kernel gasps error message about CPUs being over temp,
machine shuts down.

Expected results:
Fans rev to maximum sooner, CPUs are refreshed and continue their job at a
pleasant temperature.

Additional info:

Comment 1 Dennis Brylow 2006-07-03 23:01:52 UTC
Since posting this original (and granted, somewhat sarcastic) bug report, I've
learned a great deal more about the thermal control system on the new G5's.  Had
I appreciated how experimental and new the Linux kernel support was for this
particular machine's cooling system, I probably would have opted for a different
machine.  Nevertheless, now I am stuck with it.

I don't think there is anything wrong with my hardware, as I have now been able
to stress test the system under OS X.  The fans rev up appropriately, but even
with both CPUs under a heavy load over a prolonged period of time, the machine
does not unexpectedly overheat and shut down.

I've also tried the latest edition of Yellowdog Linux, kernel version
2.6.15-rc5.ydl.1g5-smp, and I have similar unfortunate behavior.

I get roughly four errors in /var/log/messages a second apart:

kernel: Warning ! CPU 1 temperature way above maximum (84) !

Followed immediately by

kernel: Temperature control detected a critical condition
kernel: Attempting to shut down...

while I can hear that the fans are still physically ramping up to speed, and
have not yet reached full bore.


Having read a lot over the past few days, I realize that some other folks have
worked very hard to get their G5's not to run their fans full blast when not
needed.  But the PID controller seems to be a little too slow on the uptake for
my system.  I need to know how to turn off the clever fan control algorithm, and
just run my fans full blast.  And I cannot recompile the kernel to do it,
because my machine overheats and shuts down within the first minute of compilation.

I've tried adjusting process priorities, using other tactics to get the fans
revved up (but without overheating the processor,) and interleaving I/O-bound
processes to give the CPUs some slack -- to no avail.  I'm not making much head
way here.

Does anyone know anything more about this kernel subsystem?

Comment 2 Dave Jones 2006-10-16 20:19:15 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 3 Dennis Brylow 2006-10-21 02:08:33 UTC
Kernel update 2.6.18-1.2200.fc5 has indeed fixed this most vexing problem.

Comment 4 crispy.beef 2009-06-18 15:51:39 UTC
Even though this is quite an old bug report I would like to add my comments about this same issue on FC10. I have had the exact same issues, under load the machine without fail will spin up the fans and then shutdown due to overheating. Monitoring the temperature in /sys/devices/temperature/cpu0_temperature (and cpu1) show that the CPUs go well above the 88 degrees safe limit, sometime reaching well into the 90s before what I assume is the hardware shutdown kicks in.

The kernel version is:

# uname -r
2.6.27.24-170.2.68.fc10.ppc64

Here is the output from /var/log/message while putting the machine under heavy load (ffmpeg):

Jun 16 16:31:50 g5 kernel: Warning ! Temperature way above maximum (96) !
Jun 16 16:31:51 g5 kernel: Temperature 88 above max 88. overtemp 72
Jun 16 16:31:52 g5 kernel: Temperature back down to 85
Jun 16 16:34:41 g5 kernel: Temperature 88 above max 88. overtemp 1
Jun 16 16:34:42 g5 kernel: Temperature 88 above max 88. overtemp 2
Jun 16 16:34:43 g5 kernel: Temperature back down to 86
Jun 16 16:38:02 g5 kernel: Temperature 88 above max 88. overtemp 1
Jun 16 16:38:03 g5 kernel: Temperature back down to 86
Jun 16 16:41:35 g5 kernel: Temperature 88 above max 88. overtemp 1
Jun 16 16:41:36 g5 kernel: Temperature back down to 87
Jun 16 16:44:05 g5 kernel: Temperature 94 above max 88. overtemp 1
Jun 16 16:44:06 g5 kernel: Temperature back down to 87
Jun 16 16:44:17 g5 kernel: Temperature 88 above max 88. overtemp 1
Jun 16 16:44:18 g5 kernel: Temperature 94 above max 88. overtemp 2
Jun 16 16:44:19 g5 kernel: Temperature 95 above max 88. overtemp 3
Jun 16 16:44:20 g5 kernel: Warning ! Temperature way above maximum (96) !
Jun 16 16:44:21 g5 kernel: Warning ! Temperature way above maximum (97) !
Jun 16 16:44:22 g5 kernel: Warning ! Temperature way above maximum (97) !
Jun 16 16:44:23 g5 kernel: Warning ! Temperature way above maximum (97) !
Jun 16 16:44:23 g5 kernel: Temperature control detected a critical condition
Jun 16 16:44:23 g5 kernel: Attempting to shut down...
Jun 16 16:44:23 g5 kernel: Can't call /sbin/critical_overtemp, power off now!

It would seem this bug has reared it's head again?

Comment 5 Dennis Brylow 2009-06-18 16:45:05 UTC
> It would seem this bug has reared it's head again?

Yes, I'm afraid that I have struggled (silently) with regressions of this bug for the past three years, through Fedoras 6, 7, 8, and most recently 10.  Some kernel versions seem to agree with my Dual-proc G5 fans, others do not.  With F10, I realized at some point that it was now overheating and shutting down at least once a week.  I couldn't find any F10 kernels that would address the problem, so I gave up at long last and put OS X back on it.  Now the machine is happy as a clam, and hasn't overheated in weeks.

It is a perplexing bug.  Obviously, not everyone running Fedora on ppc64 is seeing this, or it would be a much bigger deal.  I dutifully tracked bug reports and revisions to the fan control code in the kernel, and there didn't seem to be any obvious explanation for why some kernel versions worked, and others were disasterous for me.  The fan control code for the G5 series has not been an area of active kernel development for some time, so it is fairly mystifying to me why results seemed to be so inconsistent between various versions.

Comment 6 crispy.beef 2009-06-24 13:14:29 UTC
I've since switched from Fedora to Debian testing as I'm more familiar with the distro and compiling custom kernels. After removing the windfarm modules - I have a PowerMac7,3 which apparently uses therm_pm72 - and also putting in cpufreqd to control CPU frequency I've managed to at least stop the overheating. cpufreqd has a policy that keeps the CPUs (dual 2.5GHz PPC970FX) at 2.0GHz. With this I've compiled kernels and done the video processing. However this of course does not cure the underlying problem, and I'd much prefer having the maximum performance from the processors.


Note You need to log in before you can comment on or make changes to this bug.