Bug 703187

Summary: kernel sees extra thermal diode running hot and triggers intermittent thermal shut down
Product: [Fedora] Fedora Reporter: Alan Anderson <andersonas>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 14CC: andersonas, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-16 13:49:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Alan Anderson 2011-05-09 15:17:39 UTC
Description of problem:
Fresh FC14 install on a EliteGroup motherboard ECS A790GXM-AD3. On the 
first boot the system shut down before I could get the password 
entered. Eventually I noticed the following in the /var/log/dmesg file.

May 6 09:20:52 pvr1 kernel: [ 71.081080] Critical temperature reached (127 C), shutting down.


Version-Release number of selected component (if applicable):
FC14 kernel version is 2.6.35.12-90.fc14.x86_64

How reproducible:
Install FC14 on this motherboard.  Booting the FC15 live beta also has the same issue.  FC15 never completed the first boot due to the thermal shut down.

Steps to Reproduce:
1.  None necessary  install lm_sensors to monitor the third temp sensor.
2.
3.
  
Actual results:
[root@pvr1 ~]# service lm_sensors status
acpitz-virtual-0
Adapter: Virtual device
temp1: +30.0°C (crit = +110.0°C)

k10temp-pci-00c3
Adapter: PCI adapter
temp1: +11.0°C (high = +70.0°C)

it8716-isa-0e80
Adapter: ISA adapter
in0: +1.04 V (min = +0.00 V, max = +4.08 V) ALARM
in1: +2.14 V (min = +0.00 V, max = +4.08 V) ALARM
in2: +1.55 V (min = +0.00 V, max = +4.08 V) ALARM
in3: +2.99 V (min = +0.00 V, max = +4.08 V) ALARM
in4: +1.22 V (min = +0.00 V, max = +4.08 V) ALARM
in5: +2.48 V (min = +0.00 V, max = +4.08 V) ALARM
in6: +2.50 V (min = +0.00 V, max = +4.08 V) ALARM
in7: +3.30 V (min = +0.00 V, max = +4.08 V) ALARM
Vbat: +3.04 V
fan1: 1885 RPM (min = 0 RPM)
fan2: 0 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
temp1: +18.0°C (low = +127.0°C, high = -53.0°C) ALARM sensor = thermal diode
temp2: +23.0°C (low = -1.0°C, high = +127.0°C) sensor = thermistor
temp3: +111.0°C (low = -1.0°C, high = +127.0°C) sensor = thermal diode <--

This is on a cold boot after sitting powered off 10 hours at room temperature.  Within six minutes temp3 was at 124C.   As you can see the CPU is only 23C, 18C would be the case temp.

The Health screen in the BIOS only list two temperatures that correspond to the lm_sensors temp1 and temp2.  I do not see any reference to a third temperature in the BIOS.  So either this is extra hardware the BIOS does not see but the linux kernel (it87)  does or the  Linux kernel is detecting other hardware as a thermal diode that is running hot. 

Expected results:
Either do not detect this thermal diode if it does not really exist. Or provide an offset so the reading can be a realistic temperature and not sit on the ragged edge of a thermal shut down.

Additional info:
I have a work around that will keep the system from shutting down.  The thermal diode is running as 124C which is three degrees from a thermal shut down, and can easily exceed 127C reading.  But if I change the /sys/devices/platform/it87.3712/temp3_type from a 3 to a 4 it is read as a thermistor and is now sitting at 90C which is 37C degrees away from a thermal shut down. 

In the script /etc/rc.d/rc.local add:
/bin/echo 4 > /sys/devices/platform/it87.3712/temp3_type

This changes the sensor to a thermistor.
root@pvr1 ~]# service lm_sensors status
acpitz-virtual-0
Adapter: Virtual device
temp1: +30.0°C (crit = +110.0°C)

k10temp-pci-00c3
Adapter: PCI adapter
temp1: +16.5°C (high = +70.0°C)

it8716-isa-0e80
Adapter: ISA adapter
in0: +1.04 V (min = +0.00 V, max = +4.08 V) ALARM
in1: +2.14 V (min = +0.00 V, max = +4.08 V) ALARM
in2: +1.55 V (min = +0.00 V, max = +4.08 V) ALARM
in3: +2.99 V (min = +0.00 V, max = +4.08 V) ALARM
in4: +1.22 V (min = +0.00 V, max = +4.08 V) ALARM
in5: +2.48 V (min = +0.00 V, max = +4.08 V) ALARM
in6: +2.50 V (min = +0.00 V, max = +4.08 V) ALARM
in7: +3.30 V (min = +0.00 V, max = +4.08 V) ALARM
Vbat: +3.06 V
fan1: 1896 RPM (min = 0 RPM)
fan2: 0 RPM (min = 0 RPM)
fan3: 3260 RPM (min = 0 RPM)
temp1: +24.0°C (low = +127.0°C, high = -53.0°C) ALARM sensor = thermal diode
temp2: +27.0°C (low = -1.0°C, high = +127.0°C) sensor = thermistor
temp3: +90.0°C (low = -1.0°C, high = +127.0°C) sensor = thermistor

Now instead of 124C temp3 is at 90C, so 37C away from a thermal shut down.

Comment 1 Alan Anderson 2011-05-15 23:15:57 UTC
Well this ended up being a BIOS issue.  I updated the BIOS to the latest the ECS (V7/21/10) has available for this motherboard and on a reboot and all of the it8716 sensors that lm_sensors was reporting are all gone


[root@pvr1 ~]# service lm_sensors status
acpitz-virtual-0
Adapter: Virtual device
temp1:        +30.0°C  (crit = +110.0°C)

k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +15.0°C  (high = +70.0°C)