Bug 1050106 - Fedora-20, Kernel 3.15.7-200.fc20.x86_64: HIGH temps and anomalous CPU temp & speed behavior [Re-OPENED]...
Summary: Fedora-20, Kernel 3.15.7-200.fc20.x86_64: HIGH temps and anomalous CPU temp &...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 20
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-08 17:03 UTC by nmvega
Modified: 2014-09-13 18:27 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-09-13 18:27:14 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg output, followed by /proc/cpuinfo output ... (83.20 KB, text/plain)
2014-01-08 17:06 UTC, nmvega
no flags Details

Description nmvega 2014-01-08 17:03:59 UTC
Description of problem:
Various O/S commands or dmesg(1M) output indicate problems with CPU HIGH temperatures and speeds. This urgently is concerning. I can't run the
laptop this way. Please help.

=============
THE SETUP:
=============
- O/S: Fedora 19. Kernel version: 3.12.6-200.fc19.x86_64

- The system is a "Clevo P170HM" Laptop with 8-Core Intel CPU.

- I only SSH into the laptop to do things. So no windowing systems is ever
  started (i.e. the laptop display always shows the TTY Console login prompt,
  and the lid is always closed). Again I only ssh into it remotely. No
  attached monitor either.

- The laptop is *very* well ventilated (lots of room to breathe) and also
  sits on top of a dedicated external cooling fan I purchased for it.


============================
THE PROBLEMS / INDICATORS:
============================

############################################
(1) *Relatively* high temperatures reported *immediately* after booting up.
    And they vary non-trivially on consecutive probes. Here are two
    consecutive runs of the sensors(1) command:

user@linux$ sensors -f
acpitz-virtual-0
Adapter: Virtual device
temp1:       +107.6°F  (crit = +309.2°F)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +107.6°F  (high = +186.8°F, crit = +212.0°F)
Core 0:        +107.6°F  (high = +186.8°F, crit = +212.0°F)
Core 1:         +96.8°F  (high = +186.8°F, crit = +212.0°F)
Core 2:         +95.0°F  (high = +186.8°F, crit = +212.0°F)
Core 3:         +87.8°F  (high = +186.8°F, crit = +212.0°F)

pkg-temp-0-virtual-0
Adapter: Virtual device
temp1:       +107.6°F  

===

user@linux$ sensors -f
acpitz-virtual-0
Adapter: Virtual device
temp1:       +107.6°F  (crit = +309.2°F)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +107.6°F  (high = +186.8°F, crit = +212.0°F)
Core 0:        +105.8°F  (high = +186.8°F, crit = +212.0°F)
Core 1:        +105.8°F  (high = +186.8°F, crit = +212.0°F) <==== Example.
Core 2:        +103.0°F  (high = +186.8°F, crit = +212.0°F) <==== Example.
Core 3:         +87.8°F  (high = +186.8°F, crit = +212.0°F)

pkg-temp-0-virtual-0
Adapter: Virtual device
temp1:       +109.4°F
############################################


############################################
(2) Each Core is running/scaled to different CPU frequency. Not certain, but
    I don't believe this is correct.

user@linux$ cat /proc/cpuinfo | grep MHz
cpu MHz		: 894.628
cpu MHz		: 1188.867
cpu MHz 	: 1148.828
cpu MHz		: 1127.539
cpu MHz 	: 1259.082
cpu MHz		: 1216.601
cpu MHz		: 1270.898
cpu MHz		: 1248.535
############################################


############################################
(3) As the O/S is booting up, I see the following messages scroll by,
    and also appear constantly in dmesg(1M) and "/var/log/messages" output.
    Why is the being reported since I just turned the laptop on?

user@plinux$ sudo grep -i temperature /var/log/messages
Jan  8 11:08:12 p170hm-nic kernel: [  967.652909] CPU0: Core temperature above threshold, cpu clock throttled (total events = 1)
Jan  8 11:08:12 p170hm-nic kernel: [  967.652910] CPU4: Core temperature above threshold, cpu clock throttled (total events = 1)
Jan  8 11:08:12 p170hm-nic kernel: [  967.652913] CPU4: Package temperature above threshold, cpu clock throttled (total events = 11)
Jan  8 11:08:12 p170hm-nic kernel: [  967.652960] CPU5: Package temperature above threshold, cpu clock throttled (total events = 11)
Jan  8 11:08:12 p170hm-nic kernel: [  967.652961] CPU1: Package temperature above threshold, cpu clock throttled (total events = 11)
Jan  8 11:08:12 p170hm-nic kernel: [  967.652962] CPU6: Package temperature above threshold, cpu clock throttled (total events = 11)
Jan  8 11:08:12 p170hm-nic kernel: [  967.652963] CPU2: Package temperature above threshold, cpu clock throttled (total events = 10)
Jan  8 11:08:12 p170hm-nic kernel: [  967.652965] CPU7: Package temperature above threshold, cpu clock throttled (total events = 11)
Jan  8 11:08:12 p170hm-nic kernel: [  967.652966] CPU3: Package temperature above threshold, cpu clock throttled (total events = 11)
Jan  8 11:08:12 p170hm-nic kernel: [  967.653115] CPU0: Package temperature above threshold, cpu clock throttled (total events = 11)
############################################


############################################
(4) Without doing anything substantial the CPU temperature sky rockets.
    Below for example, I will run the sensor(1M) command, then start
    a lightweight LXC container (running FC19 and the same kernel as the
    host laptop), then run sensors(1M) immediately after it finishes
    booting. Watch the CPU temperatues sky-rocket. But why? The LXC
    O/S is not even configured to do anything (was just installed with
    minimal packages).

    After I start the container, the internal fans *immediately* spin
    up to high speeds. And it's not just when I run containers, but
    do other things, too (for example, when I start "dropbox").

user@linux$ sensors -f   <--- laptop is idle here. Only my SSH session.
acpitz-virtual-0
Adapter: Virtual device
temp1:       +107.6°F  (crit = +309.2°F)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +107.6°F  (high = +186.8°F, crit = +212.0°F)
Core 0:        +105.8°F  (high = +186.8°F, crit = +212.0°F)
Core 1:        +105.8°F  (high = +186.8°F, crit = +212.0°F)
Core 2:        +105.8°F  (high = +186.8°F, crit = +212.0°F)
Core 3:         +86.0°F  (high = +186.8°F, crit = +212.0°F)

pkg-temp-0-virtual-0
Adapter: Virtual device
temp1:       +107.6°F  

user@plinux$ sudo lxc-start -n vps0  <--- start a LightWeight LXC container.
[ ... snip ... ]
Fedora release 19 (Schrödinger’s Cat)
Kernel 3.12.6-200.fc19.x86_64 on an x86_64 (console)
vps1 login:

  NOTE: Again, the container does nothing. It's a minimal install and hasn't
  been set up to do anything at all. Yet 1-SECOND after starting it
  sensors(1M) shows temperatures that completely SKYROCKET and REMAIN
  THERE, and the fans immediately spin up.

user@plinux$ sensors -f
acpitz-virtual-0
Adapter: Virtual device
temp1:       +194.0°F  (crit = +309.2°F)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +194.0°F  (high = +186.8°F, crit = +212.0°F)
Core 0:        +150.8°F  (high = +186.8°F, crit = +212.0°F)
Core 1:        +145.4°F  (high = +186.8°F, crit = +212.0°F)
Core 2:        +156.2°F  (high = +186.8°F, crit = +212.0°F)
Core 3:        +194.0°F  (high = +186.8°F, crit = +212.0°F)

pkg-temp-0-virtual-0
Adapter: Virtual device
temp1:       +194.0°F


user@plinux$ sensors -f   <--- Again 5 secs later. I'm not even doing anything.
acpitz-virtual-0
Adapter: Virtual device
temp1:       +206.6°F  (crit = +309.2°F)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +206.6°F  (high = +186.8°F, crit = +212.0°F)
Core 0:        +181.4°F  (high = +186.8°F, crit = +212.0°F)
Core 1:        +206.6°F  (high = +186.8°F, crit = +212.0°F)
Core 2:        +163.4°F  (high = +186.8°F, crit = +212.0°F)
Core 3:        +131.0°F  (high = +186.8°F, crit = +212.0°F)

pkg-temp-0-virtual-0
Adapter: Virtual device
temp1:       +206.6°F 


Then, when I stop the container, the temperatures immediately go down
significantly (immediately):

user@plinux$ lxc-shutdown -n vps0
user@plinux$ sensors -f
acpitz-virtual-0
Adapter: Virtual device
temp1:       +123.8°F  (crit = +309.2°F)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +123.8°F  (high = +186.8°F, crit = +212.0°F)
Core 0:        +122.0°F  (high = +186.8°F, crit = +212.0°F)
Core 1:        +122.0°F  (high = +186.8°F, crit = +212.0°F)
Core 2:        +122.0°F  (high = +186.8°F, crit = +212.0°F)
Core 3:        +122.0°F  (high = +186.8°F, crit = +212.0°F)

pkg-temp-0-virtual-0
Adapter: Virtual device
temp1:       +123.8°F 


Full logs are attached.

Again, I wanted to try upgrading to, FC20 but cannot due to this
(unrelated) Bug ID: 1048404.

Something is very wrong here. Please help. I'm concerned about ruining
the laptop and reducing it's MTBF. Status: I can't run it like this.

Thank you!

Comment 1 nmvega 2014-01-08 17:06:20 UTC
Created attachment 847264 [details]
dmesg output, followed by /proc/cpuinfo output ...

Comment 2 nmvega 2014-01-08 18:47:36 UTC
Additional information...

The reason I'm trying LXC Containers is because I currently run multiple fully-virtualized KVM vmachines (like 4 to 6 CentOS VMs depending on the development work I'm doing). Again, there are no graphics running on the host Fedora-19 laptop or the CentOS6 KVMs... just daemons: ssh, Hadoop, Storm (by twitter), Cassandra, etc. Not all at once, but things like that.

But since this is my environment, I don't require all the security and isolation that KVM VMs provide. All I need are separate IPs for each guest, so using LXC Containers with different IP can be more light weight / efficient.

But I ran into the CPU TEMP/CPU CLOCK SPEED/FAN SPEED issue previously described. And that was just running one (qty. 1) LXC Container.

Ironically, when I run two (qty. 2) full KVMs the issue doesn't appear.
Have a look:

root@p170hm# virsh start centOS6-vm0
root@p170hm# virsh start centOS6-vm1
root@p170hm# virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     centOS6-vm0                    running  <--- Here
 3     centOS6-vm1                    running  <--- and Here
 -     centOS6-vm2                    shut off
 -     centOS6-vm3                    shut off
 -     centOS6-vm4                    shut off
 -     centOS6-vm5                    shut off
 -     centOS6-vm6                    shut off
 -     centOS6-vm7                    shut off


root@p170hm# sensors -f  <--- Reasonable temps in this scenario. No spikes.
                              And fans aren't blowing hard as they did with LXC.
======================================
acpitz-virtual-0
Adapter: Virtual device
temp1:       +107.6°F  (crit = +309.2°F)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +107.6°F  (high = +186.8°F, crit = +212.0°F)
Core 0:        +107.6°F  (high = +186.8°F, crit = +212.0°F)
Core 1:        +104.0°F  (high = +186.8°F, crit = +212.0°F)
Core 2:        +107.6°F  (high = +186.8°F, crit = +212.0°F)
Core 3:         +91.4°F  (high = +186.8°F, crit = +212.0°F)

pkg-temp-0-virtual-0
Adapter: Virtual device
temp1:       +109.4°F
======================================


IMPORTANT NOTE: This is just more information for sleuths.
It doesn't explain why (shown previously) the CPU core speeds differ
(their speeds are never in sync, even in this KVM scenario); or why
running "dropbox" (in CLI mode) or other lightweight things cause
a skyrocket of the CPU temperatures in a matter of 1-Second or two.

Comment 3 markusN 2014-02-23 23:58:28 UTC
See also bug #924570

Comment 4 Justin M. Forbes 2014-03-10 14:51:53 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.13.5-100.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 5 Justin M. Forbes 2014-06-23 14:41:22 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 6 nmvega 2014-08-04 17:50:39 UTC
Hello Again:

RE-OPENING THIS BUG.

This bug was closed citing insufficient data, but I didn't get an alert email requesting it.

This bug, originally openened for Fedora 19 and closed withough resolution, is being re-opened for Fedora-20 (since that is the version I'm now running), with the latest yum updates, and kernel which is:

Kernel: 3.15.7-200.fc20.x86_64

Several months ago, I stopped my intention to use LXCs and reverted back to KVMs because of this issue, hoping that by Fedora-20 this would be resolved, but it isn't. This needs looking into because LXCs are much more efficient to use, maintain, and deploy, and run.

This issue happens consistently on the all of my Fedora-20 based computers (just as they did when they were on FC19)...
(A) the original Clevo P170HM laptop with which this bug was filed, and (b) now on an heavy-duty "Digital Storm" model computer with 64GB RAM, SSD, RAID, i7CPU (with 6 cores, two threads each). And a third computer as well.

============================================================
It's the same exact issue as well documented above:
============================================================
(1) Starting a basic LXC container, which is not configured to do anything at all, *immediately* (and without delay) raises the temperature *substantially* of one of the cores.

(2) Starting a second LXC container (also not configured to do anything), does the same as (1), but on a different core.
============================================================



===========================================================
Demonstration Output:
===========================================================
dstorm$ # No LXCs running.
dstorm$ sensors -f  (All is normal).
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +77.0°F  (high = +176.0°F, crit = +194.0°F)
Core 0:         +71.6°F  (high = +176.0°F, crit = +194.0°F)
Core 1:         +73.4°F  (high = +176.0°F, crit = +194.0°F)
Core 2:         +75.2°F  (high = +176.0°F, crit = +194.0°F)
Core 3:         +69.8°F  (high = +176.0°F, crit = +194.0°F)
Core 4:         +73.4°F  (high = +176.0°F, crit = +194.0°F)
Core 5:         +73.4°F  (high = +176.0°F, crit = +194.0°F)

dstorm$ sensors -f (All is normal).
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +80.6°F  (high = +176.0°F, crit = +194.0°F)
Core 0:         +73.4°F  (high = +176.0°F, crit = +194.0°F)
Core 1:         +73.4°F  (high = +176.0°F, crit = +194.0°F)
Core 2:         +75.2°F  (high = +176.0°F, crit = +194.0°F)
Core 3:         +66.2°F  (high = +176.0°F, crit = +194.0°F)
Core 4:         +71.6°F  (high = +176.0°F, crit = +194.0°F)
Core 5:         +73.4°F  (high = +176.0°F, crit = +194.0°F)

dstorm$ sudo lxc-start -d -n vps00 (Start a container).
dstorm$ sensors -f (**Immediate 27-degree jump for Core-1**).
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +100.4°F  (high = +176.0°F, crit = +194.0°F)  <-- spike
Core 0:         +84.2°F  (high = +176.0°F, crit = +194.0°F)
Core 1:        +100.4°F  (high = +176.0°F, crit = +194.0°F)  <-- spike
Core 2:         +82.4°F  (high = +176.0°F, crit = +194.0°F)
Core 3:         +71.6°F  (high = +176.0°F, crit = +194.0°F)
Core 4:         +75.2°F  (high = +176.0°F, crit = +194.0°F)
Core 5:         +80.6°F  (high = +176.0°F, crit = +194.0°F)

dstorm$ sensors -f            
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +100.4°F  (high = +176.0°F, crit = +194.0°F)  <-- spike
Core 0:         +86.0°F  (high = +176.0°F, crit = +194.0°F)
Core 1:        +100.4°F  (high = +176.0°F, crit = +194.0°F)  <-- spike
Core 2:         +84.2°F  (high = +176.0°F, crit = +194.0°F)
Core 3:         +71.6°F  (high = +176.0°F, crit = +194.0°F)
Core 4:         +77.0°F  (high = +176.0°F, crit = +194.0°F)
Core 5:         +80.6°F  (high = +176.0°F, crit = +194.0°F)

dstorm$ sudo lxc-start -d -n vps01 (Start a second container).
dstorm$ sensors -f  (Temperatures are even higher now).
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +109.4°F  (high = +176.0°F, crit = +194.0°F) <-- spike
Core 0:         +89.6°F  (high = +176.0°F, crit = +194.0°F)
Core 1:        +111.2°F  (high = +176.0°F, crit = +194.0°F) <-- spike
Core 2:        +107.6°F  (high = +176.0°F, crit = +194.0°F) <-- spike
Core 3:         +75.2°F  (high = +176.0°F, crit = +194.0°F)
Core 4:         +80.6°F  (high = +176.0°F, crit = +194.0°F)
Core 5:         +84.2°F  (high = +176.0°F, crit = +194.0°F)

dstorm$ sensors -f
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +111.2°F  (high = +176.0°F, crit = +194.0°F) <-- spike
Core 0:         +91.4°F  (high = +176.0°F, crit = +194.0°F)
Core 1:        +109.4°F  (high = +176.0°F, crit = +194.0°F) <-- spike
Core 2:        +111.2°F  (high = +176.0°F, crit = +194.0°F) <-- spike
Core 3:         +75.2°F  (high = +176.0°F, crit = +194.0°F)
Core 4:         +78.8°F  (high = +176.0°F, crit = +194.0°F)
Core 5:         +84.2°F  (high = +176.0°F, crit = +194.0°F)
=====================

At this point the fans are noticeably faster, and temperature LED read-out on the "Digital Storm" computer reats ~410 degrees, where it normally reads ~320.

Here is what each LXC container is doing (not much); and btw. they are also running Fedora-20 with the same kernel:


dstorm$ lxc-ps --lxc
CONTAINER   PID TTY          TIME CMD
vps00      9616 ?        00:00:00 systemd
vps00      9646 ?        00:35:36 systemd-journal
vps00      9654 ?        00:00:00 systemd-udevd
vps00      9976 ?        00:00:00 firewalld
vps00      9979 ?        00:00:00 rsyslogd
vps00      9983 ?        00:00:00 dbus-daemon
vps00      9986 ?        00:00:00 systemd-logind
vps00      9993 pts/4    00:00:00 agetty
vps00      9995 pts/2    00:00:00 agetty
vps00      9998 pts/5    00:00:00 agetty
vps00      9999 pts/3    00:00:00 agetty
vps00     10006 pts/6    00:00:00 agetty
vps00     10012 ?        00:00:00 sshd

vps01     10754 ?        00:00:00 systemd
vps01     10784 ?        00:35:05 systemd-journal
vps01     10789 ?        00:00:00 systemd-udevd
vps01     11204 ?        00:00:00 firewalld
vps01     11206 ?        00:00:00 rsyslogd
vps01     11207 ?        00:00:00 dbus-daemon
vps01     11211 ?        00:00:00 systemd-logind
vps01     11232 pts/10   00:00:00 agetty
vps01     11233 pts/8    00:00:00 agetty
vps01     11234 pts/11   00:00:00 agetty
vps01     11235 pts/9    00:00:00 agetty
vps01     11236 pts/12   00:00:00 agetty
vps01     11264 ?        00:00:00 sshd
vps00     11908 ?        00:00:00 systemd
vps00     11910 ?        00:00:00 (sd-pam)
vps01     11965 ?        00:00:00 systemd
vps01     11967 ?        00:00:00 (sd-pam)



[Final side note]: Although this will not solve the issue (it will only shift the issue around), I plan on setting the affinity of each LXC instance to a different core. Above, both instances share Core-1 and Core-2. I will try to change this in the LXC config file for each instance. But again, this is just to enhance CPU distribution performance. The temperature issue is still a problem.

Try LXC and you will see the issue. It's easily reproducible.

Can we continue this and find a resolution? I'm concerned about the life impact these warmer temperatures will have on the computer. (please and thank you). :)

Comment 7 nmvega 2014-08-12 02:01:40 UTC
Hello...

Any ideas on this.

Thanks!

Comment 8 nmvega 2014-08-23 06:29:32 UTC
Can somone please look at this/fix this? I need to use containers (instead of KVM) but can't because of this temperature & FAN issue -- which shouldn't be happening just because I spin up a do-nothing/idle LXC.

If this can't be fixed, then LXCs may as well not exist, as their adverse affect on CPU temps and fan speeds (fully describe above) is a show-stopper.

Please!

Comment 9 nmvega 2014-09-09 01:39:30 UTC
Can someone help with this? Anyone please. Thanks.

Comment 10 nmvega 2014-09-13 18:27:14 UTC
Closed by original submitter (nmvega). No one paid attention to this urgent issue -- which prohibits using Fedora as a host to LXC containers -- despite repeated requests on *this* bug ID. Opening a new ticket for this same issue.


Note You need to log in before you can comment on or make changes to this bug.