With Fedora Core 4 Test 2 (also with FC4T1, but I forgot about that problem after I had customised /proc/acpi/thermal_zone/THRM/trip_points as a work-around) I experience the following kernel log messages and a quick emergency shutdown: Critical temperature reached (65 C), shutting down. Critical temperature reached (64 C), shutting down. After a fresh FC4T2 installation I didn't customise /proc/.../trip_points and hence ran into this multiple times today while transferring pictures from a digital camera. The kernel killed the machine already after a few minutes of uptime in three consecutive attempts until I realised the reason for the shutdown. With Fedora Core 3 and older on the same hardware, I've never (!) seen this before, and I have not customised a different critical temperature there. In /proc/acpi/thermal_zone/THRM/ with Fedora Core 3 I see: cooling_mode polling_frequency state temperature trip_points $ cat cooling_mode <setting not supported> cooling mode: passive $ cat polling_frequency <polling disabled> $ cat state state: ok $ cat temperature temperature: 63 C $ cat trip_points critical (S5): 65 C passive: 55 C: tc1=2 tc2=4 tsp=50 devices=0xeffee600 With Fedora Core 4 Test 2, the only differences are: $ cat state state: passive $ cat trip_points critical (S5): 65 C passive: 55 C: tc1=2 tc2=4 tsp=50 devices=0xeffeeb1c As a first observation, the critical temperature of 65 degrees Celsius doesn't match the "ACPI Shut Down Temp." setting in the BIOS. I have "90 C" there since last summer. As a second observation, 65 C looks like a bad default for an AMD Athlon/Duron chip, which are infamous for their high operating temperature. Although this is a low-end desktop machine not used for any high load, 60-63 C is reached easily according to /proc/.../temperature, and the BIOS reports an even higher temperature (no sensors configured and running). $ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 7 model name : AMD Duron(tm) stepping : 1 cpu MHz : 1303.148 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat pse36 mmx fxsr sse pni syscall mp mmxext 3dnowext 3dnow bogomips : 2580.48 BIOS firmware is last release and a few years old by now. :)
Created attachment 113437 [details] DMI data for GA-7ZX BIOS release fg
any luck with the latest errata kernel ?
Mass update to all FC4 bugs: An update has been released (2.6.14-1.1637_FC4) which rebases to a new upstream kernel (2.6.13.2). As there were ~3500 changes upstream between this and the previous kernel, it's possible your bug has been fixed already. Please retest with this update, and update this bug if necessary. Thanks.
Unchanged behaviour. Kernel still defaults to shutting down the machine at 65 C unless I customise the trip_points. This max. value of 65 C doesn't match the temperature threshold configured in the BIOS.
Could you please attach the output from acpidump, available in pmtools here: http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/
Created attachment 123385 [details] acpidump output (pmtools-20050926)
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
Closing per previous comment.
# uname -a Linux faldor.intranet 2.6.16-1.2107_FC4 #1 Tue May 2 19:15:13 EDT 2006 i686 athlon i386 GNU/Linux # cat /proc/acpi/thermal_zone/THRM/trip_points critical (S5): 65 C passive: 55 C: tc1=2 tc2=4 tsp=50 devices=0xeffec760 On the contrary: BIOS ACPI Shutdown Temperature: 90 C
[This comment added as part of a mass-update to all open FC4 kernel bugs] FC4 has now transitioned to the Fedora legacy project, which will continue to release security related updates for the kernel. As this bug is not security related, it is unlikely to be fixed in an update for FC4, and has been migrated to FC5. Please retest with Fedora Core 5. Thank you.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
This bug has been mass-closed along with all other bugs that have been in NEEDINFO state for several months. Due to the large volume of inactive bugs in bugzilla, this is the only method we have of cleaning out stale bug reports where the reporter has disappeared. If you can reproduce this bug after installing all the current updates, please reopen this bug. If you are not the reporter, you can add a comment requesting it be reopened, and someone will get to it asap. Thank you.
$ rpm -q kernel kernel-2.6.19-1.2911.6.5.fc6
Reproducible in Fedora 7 and Rawhide: $ rpm -q kernel kernel-2.6.21-1.3240.fc8 Additionally, in Rawhide I'm unable to set the trip points: # echo -n "85:0:80:60:0" > /proc/acpi/thermal_zone/THRM/trip_points -bash: echo: write error: Invalid argument As my machine is shut down, I cannot use and test Rawhide.
the command is: # echo 85:0:80:60:0 > /proc/acpi/thermal_zone/THRM/trip_points
That is no different except for the linefeed, but it doesn't work either.
http://article.gmane.org/gmane.linux.acpi.devel/22750 The trip_points are read-only now. :-( [...] Where does the kernel take the wrong "65 C" value from?
critical (S5): 65 C passive: 55 C: ... mystery #1: 65C != BIOS SETUP ACPI critical shutdown temperature. Can you actually change this field in the BIOS SETUP? If yes, do the changes there have any effect at all on what you see in the trip_points file? (or any differences in the inb/outb command results requested below?) mystery #2: FC3 didn't shut down, FC4 and later do shut down this was the 2.6.9 -> 2.6.12 period. It is quite possible that something started working here that was broken in FC3. Indeed, the fact that FC3 output show state=ok when the temperature is 63 -- clearly above the 55C passive trip-point -- suggests that it was FC3 that was actually broken. BTW. Does this system have a fan? Yes, 65C seems very low for a critical shutdown. 55C also seems quite low to throttle your processor. please attach the output from dmesg -s64000 running the latest kernel you've got on hand. please paste the output from lspci Here is the ThermalZone in the DSDT: OperationRegion (FNOR, SystemIO, 0x084C, 0x04) Scope (\_TZ) { Name (THBF, Buffer (0x04) { 0x00, 0x00, 0x00, 0x00 }) Method (KELV, 1, NotSerialized) { And (Arg0, 0xFF, Local0) Multiply (Local0, 0x0A, Local0) Add (Local0, 0x0AAC, Local0) Return (Local0) } Name (PLCY, 0x00) // unclear why PLCY is a variable, as AML doesn't write it // it is used below to select the passive trip-point OperationRegion (THOR, SystemIO, 0x72, 0x02) Field (THOR, ByteAcc, NoLock, Preserve) { ECMI, 8, ECMD, 8 } IndexField (ECMI, ECMD, ByteAcc, NoLock, Preserve) { Offset (0xF0), TMIN, 8, TMAX, 8, TCRT, 8 } // These are the fields we want to see Name (TSP, 0x05) Name (TC1, 0x02) Name (TC2, 0x04) OperationRegion (TSN1, SystemIO, 0x0C20, 0x01) Field (TSN1, ByteAcc, NoLock, Preserve) { CURT, 8 } Method (TCHG, 0, NotSerialized) { Noop } Method (RTMP, 0, NotSerialized) { Not (CURT, Local0) Subtract (Local0, 0xB3, Local0) Not (Local0, Local0) Add (Local0, 0x01, Local0) And (Local0, 0xFF, Local0) Store (Local0, Local1) Divide (Local0, 0x0A, Local0, Local2) Subtract (Local1, Local2, Local0) ShiftRight (Local0, 0x01, Local0) Store (Local0, DBG8) Return (Local0) } ThermalZone (THRM) { Method (_CRT, 0, NotSerialized) { Return (KELV (TCRT)) } Method (_TMP, 0, NotSerialized) { If (LEqual (TCRT, 0x4F)) { Return (KELV (0x1E)) } Else { Return (KELV (RTMP ())) } } Name (_PSL, Package (0x01) { \_PR.CPU1 }) Method (_TSP, 0, NotSerialized) { Multiply (TSP, 0x0A, Local0) Return (Local0) } Method (_TC1, 0, NotSerialized) { Return (TC1) } Method (_TC2, 0, NotSerialized) { Return (TC2) } Method (_PSV, 0, NotSerialized) { If (PLCY) { Return (KELV (TMIN)) } Else { Return (KELV (TMAX)) } } } } please paste the output from these commands: # outb 0xF0 0x72 # inb 0x73 this should give us TMIN # outb 0xF1 0x72 # inb 0x73 this should give us TMAX # outb 0xF2 0x72 # inb 0x73 This should give us TCRT If you repeat this, you should get the same answers. It would be good to verify also that you get the same answers with "acpi=off". BTW. booting with "acpi=off" should work-around the symptom of this bug.
Created attachment 159636 [details] dmesg
Created attachment 159637 [details] lspci
> mystery #1: 65C != BIOS SETUP ACPI critical shutdown temperature. > > Can you actually change this field in the BIOS SETUP? Yes, I can choose from "disabled, 70 C, 80 C and 90 C". > If yes, do the changes there have any effect at all > on what you see in the trip_points file? Seems so. 90 C maps to 65 C critical, 55 C passive 80 C maps to 60 C critical, 50 C passive Looks like factor /2 is involved somewhere. > BTW. Does this system have a fan? CPU fan and power fan. > please paste the output from these commands: > # outb 0xF0 0x72 > inb 0x73 Where do I get the commands? I've done it in C as a work-around: # ~misc/files/source/inb_outb 55 55 65 > If you repeat this, you should get the same answers. Yes. > It would be good to verify also that you get the same > answers with "acpi=off". I do. > BTW. booting with "acpi=off" should work-around the symptom of this bug. Been doing that with Rawhide since it came into my mind, too.
> # ~misc/files/source/inb_outb > 55 > 55 > 65 Assuming this is the case with 65 critical and 55 passive, this confirms that Linux/ACPI/AML are correctly reading and acting on the underlying memory locations where the BIOS is storing these trip points > Yes, I can choose from "disabled, 70 C, 80 C and 90 C". > 90 C maps to 65 C critical, 55 C passive > 80 C maps to 60 C critical, 50 C passive If you request 70 I assume you get a critical shutdown during boot? What if you request 70, boot "acpi=off" and run inb_outb? Let me guess, we get 55 critical and 45 passive? What do you see if you request "disabled"? What is the default setting for this parameter if you globally reset the BIOS to SETUP defaults? What do you see if you modify inb_outb to do this: # outb 0xF0 0x72 # outb 0x73 0x55 // set TMIN to 85C = 0x55 # outb 0xF0 0x72 # inb 0x73 // this should give us TMIN # outb 0xF1 0x72 # outb 0x73 0x55 // set TMAX to 86 = 0x56 # outb 0xF1 0x72 # inb 0x73 // this should give us TMAX # outb 0xF2 0x72 # outb 0x73 0x5A // set TCRT to 90 = 0x5A # outb 0xF2 0x72 # inb 0x73 // this should give us TCRT you might need a temperature event to coax Linux/ACPI to re-evaluate these trip-points, which should re-read them from memory. possibly setting /proc/acpi/thermal_zone/.../polling_frequency to a non-zero value for a bit would be enough to make this happen. However, the real question is where the EC/temperature-sensors on this box are going to trip. Are they tripping at the BIOS SETUP points, the points in TMIN, TCRT, or some other values that we don't see? You should be able to determine this by disabling the critical shutdown: # mv /sbin/poweroff /sbin/poweroff.orig and enabling monitoring of ACPI events: kill acpid # cat /proc/acpi/event then poll the temperature in /proc/acpi/thermal_zone run something to heat up the system, and see if an ACPI critical trip point event actually occur at the trip point specified or not. It is possible that your original critical shutdown issue was not actually triggered by a critical shutdown event, but a mundane temperature change event that caused Linux to compare current temperature vs the (bogus) critical shutdown point. The other mystery, of course, is what Windows does. It looks like Linux/ACPI are reading the hardware properly, so it might be that Windows has a platform-specific workaround that applies to this chipset or this BIOS. I wonder if Windows has a mechanism where they display the trip points -- if so, it would be interesting to see what they display for each of the BIOS SETUP selections...
Closing as CANTFIX, since if it's a mainboard/BIOS bug, it is beyond my time to deal with it.
Michael, As we're not quite at the bottom of this sighting... if you file a new sighting at bugzilla.kernel.org vs ACPI/thermal, drop in the URL of this report, we can resume poking at it there...
Created attachment 161932 [details] 2.6.23-rc3+ patch to disable critical trip points on GA-7ZX I've added the attached DMI entry to the acpi-test tree to disable ACPI critical trip point actions on this board. and expect it to ship upstream in 2.6.23.
The patch in comment #25 shipped in Linux-2.6.23-rc3-git9
Works for me. Thank you! $ uname -a Linux rawhide.intranet 2.6.23-0.139.rc3.git10.fc8 #1 SMP Sun Aug 26 19:53:26 EDT 2007 i686 athlon i386 GNU/Linux $ grep Giga /var/log/dmesg ACPI: Gigabyte GA-7ZX detected: disabling all critical thermal trip point actions. $ cat /proc/acpi/thermal_zone/THRM/trip_points critical (S5): 65 C <disabled> passive: 55 C: tc1=2 tc2=4 tsp=50 devices=CPU1