Description of problem: Sometimes, Fedora shuts down suddenly. When I look in /var/log/messages to see what has triggered the shutdown, I find this kind of message : Critical temperature reached (861929 C), shutting down. This temperature is obviously wrong. Version-Release number of selected component (if applicable): Linux pc999.iihe.ac.be 2.6.30.5-43.fc11.x86_64 How reproducible: It seems to occur randomly. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Excerpt from /var/log/messages : ---- Sep 25 10:02:52 pc999 kernel: ACPI: EC: missing confirmations, switch off interrupt mode. Sep 25 10:02:52 pc999 kernel: ACPI Exception (evregion-0422): AE_TIME, Returned by Handler for [EmbeddedControl] [20090320] Sep 25 10:02:52 pc999 kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.SBRG.EC0_.RCTP] (Node ffff88013fa4e280), AE_TIME Sep 25 10:02:52 pc999 kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.RTMP] (Node ffff88013fa4ef60), AE_TIME Sep 25 10:02:52 pc999 kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.TZ00._TMP] (Node ffff88013fa4f040), AE_TIME Sep 25 10:02:52 pc999 kernel: Critical temperature reached (861929 C), shutting down. ----
Looks like it can't read the temperature and is returning a random value. Does it make a difference whether you start from a powered-off state vs. rebooting a running system? And did this ever happen with the 2.6.29 kernel? Sep 25 10:02:52 pc999 kernel: ACPI: EC: missing confirmations, switch off interrupt mode. Sep 25 10:02:52 pc999 kernel: ACPI Exception (evregion-0422): AE_TIME, Returned by Handler for [EmbeddedControl] [20090320] Sep 25 10:02:52 pc999 kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.SBRG.EC0_.RCTP] (Node ffff88013fa4e280), AE_TIME Sep 25 10:02:52 pc999 kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.RTMP] (Node ffff88013fa4ef60), AE_TIME Sep 25 10:02:52 pc999 kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.TZ00._TMP] (Node ffff88013fa4f040), AE_TIME Sep 25 10:02:52 pc999 kernel: Critical temperature reached (861929 C), shutting down.
Do you have any temperature sensor modules loaded?
(In reply to comment #1) > Looks like it can't read the temperature and is returning a random value. Does > it make a difference whether you start from a powered-off state vs. rebooting a > running system? And did this ever happen with the 2.6.29 kernel? > > Sep 25 10:02:52 pc999 kernel: ACPI: EC: missing confirmations, switch off > interrupt mode. > Sep 25 10:02:52 pc999 kernel: ACPI Exception (evregion-0422): AE_TIME, Returned > by Handler for [EmbeddedControl] [20090320] > Sep 25 10:02:52 pc999 kernel: ACPI Error (psparse-0537): Method parse/execution > failed [\_SB_.PCI0.SBRG.EC0_.RCTP] (Node ffff88013fa4e280), AE_TIME > Sep 25 10:02:52 pc999 kernel: ACPI Error (psparse-0537): Method parse/execution > failed [\_TZ_.RTMP] (Node ffff88013fa4ef60), AE_TIME > Sep 25 10:02:52 pc999 kernel: ACPI Error (psparse-0537): Method parse/execution > failed [\_TZ_.TZ00._TMP] (Node ffff88013fa4f040), AE_TIME > Sep 25 10:02:52 pc999 kernel: Critical temperature reached (861929 C), shutting > down. It can do it after a cold start, it can do it after a reboot. Sometimes, it doesn't do it. It never happened with 2.6.29 kernel.
(In reply to comment #2) > Do you have any temperature sensor modules loaded? Yes, the temperature sensor modules must be loaded, since I can have CPU temperature with this command : [root@pc999 ~]# cat /proc/acpi/thermal_zone/TZ00/temperature temperature: 39 C
Created attachment 362825 [details] excerpt from /var/log/messages from power on to now
(In reply to comment #4) > > Do you have any temperature sensor modules loaded? > > Yes, the temperature sensor modules must be loaded, since I can have CPU > temperature with this command : > > [root@pc999 ~]# cat /proc/acpi/thermal_zone/TZ00/temperature > temperature: 39 C I think he means hwmon / lm-sensors drivers. Can you attach the output of 'lsmod' after getting a successful boot with 2.6.30?
Can you attach the complete log from a session where it shut down because of the invalid temperature reading?
Created attachment 362974 [details] result of lsmod Result of the command lsmod after a normal boot
(In reply to comment #6) > (In reply to comment #4) > > > > Do you have any temperature sensor modules loaded? > > > > Yes, the temperature sensor modules must be loaded, since I can have CPU > > temperature with this command : > > > > [root@pc999 ~]# cat /proc/acpi/thermal_zone/TZ00/temperature > > temperature: 39 C > > I think he means hwmon / lm-sensors drivers. Can you attach the output of > 'lsmod' after getting a successful boot with 2.6.30? I've just posted it in the attachments.
Created attachment 362975 [details] excerpt from /var/log/messages with the problem
(In reply to comment #7) > Can you attach the complete log from a session where it shut down because of > the invalid temperature reading? See my third attachment.
The EC starts in poll mode, switches to interrupt mode, then stops generating interrupts 17 minutes later: Sep 29 10:11:12 pc999 kernel: ACPI: EC: GPE = 0x1c, I/O: command/status = 0x66, data = 0x62 Sep 29 10:11:12 pc999 kernel: ACPI: EC: driver started in poll mode Sep 29 10:11:12 pc999 kernel: ACPI: EC: non-query interrupt received, switching to interrupt mode Sep 29 10:28:25 pc999 kernel: ACPI: EC: missing confirmations, switch off interrupt mode. Then the handlers time out trying to read the temperature, I think: Sep 29 10:28:26 pc999 kernel: ACPI Exception (evregion-0422): AE_TIME, Returned by Handler for [EmbeddedControl] [20090320] Sep 29 10:28:26 pc999 kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.SBRG.EC0_.RCTP] (Node ffff88013fa4e280), AE_TIME Sep 29 10:28:26 pc999 kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.RTMP] (Node ffff88013fa4ef60), AE_TIME Sep 29 10:28:26 pc999 kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.TZ00._TMP] (Node ffff88013fa4f040), AE_TIME Sep 29 10:28:26 pc999 kernel: Critical temperature reached (1359203 C), shutting down. It looks like a random temperature is returned...
Could this be fixed by: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=e12ac3d018dd8f20a075f552020 The patch causes commands to be retried when the EC is in poll mode.
Added these patches in 2.6.30.9-74: acpi-ec-merge-irq-and-poll-modes.patch acpi-ec-restart-command-even-if-no-interrupts-from-ec.patch acpi-ec-use-burst-mode-only-for-msi-notebooks.patch This should fix the problem.
Can you test the build from koji? http://koji.fedoraproject.org/koji/buildinfo?buildID=135601
(In reply to comment #15) > Can you test the build from koji? > > http://koji.fedoraproject.org/koji/buildinfo?buildID=135601 I'm now working with this build from koji since Friday and the problem disappeared. So, the problem seems to be fixed. Thank you very much.
kernel-2.6.30.9-90.fc11 has been submitted as an update for Fedora 11. http://admin.fedoraproject.org/updates/kernel-2.6.30.9-90.fc11
I have a Dell Studio 1555 on an x86 kernel with the same problem. After it comes back from a suspended state, all ACPI temperatures are 0° C, which is dangerous because the fans are never turned on (I had a couple of instant shutdowns due to overheating). The latest kernel from updates-testing still has the problem downgrading to 2.6.30.8-64.fc11.i686.PAE solves the problem.
kernel-2.6.30.9-90.fc11 has been pushed to the Fedora 11 stable repository. If problems still persist, please make note of it in this bug report.
Yup, the problem persists: [abarto@roadrunner ~]$ uname -r 2.6.30.9-90.fc11.i686.PAE [abarto@roadrunner ~]$ sudo cat /proc/acpi/thermal_zone/TZ01/temperature temperature: 43 C (after suspend/resume) [abarto@roadrunner ~]$ sudo cat /proc/acpi/thermal_zone/TZ01/temperature temperature: 0 C
(In reply to comment #20) > Yup, the problem persists: > > [abarto@roadrunner ~]$ uname -r > 2.6.30.9-90.fc11.i686.PAE > [abarto@roadrunner ~]$ sudo cat /proc/acpi/thermal_zone/TZ01/temperature > temperature: 43 C > > (after suspend/resume) > > [abarto@roadrunner ~]$ sudo cat /proc/acpi/thermal_zone/TZ01/temperature > temperature: 0 C That is not the original bug that was reported.