Bug 1141456
Summary: | Fedora 19 & 20 (64bit HOST): Idle Fedora LXC guests causes immediate HIGH CPU temps. / Fan Speeds. Why? | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | nmvega <nmvega> |
Component: | lxc | Assignee: | Thomas Moschny <thomas.moschny> |
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 20 | CC: | ayman.khamouma, karlthered, mhw, nmvega, sagarun, thomas.moschny |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-06-30 01:09:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
nmvega
2014-09-13 18:58:46 UTC
Just for completeness, can you please specify the LXC package versions you are using? And I might be missing something, but is 110°F (43°C) really to be considered a high CPU temperature? Admittedly that's a substantial increase compared to 70°F (21°C), yes. However, cores in my workstation here (Core2 Quad Q9550, that's an old, but low TDP CPU) are never colder than 34°C (93°F). Anyway, I can contact upstream about this, although I am not sure they can anything do about it, as LXC is the userspace part. You filed bug 1050106 against the kernel component, which is in principle the right thing to do... Hello Thomas: (1) Here are the LXC RPMs (latest of them), although as mentioned the issue has persisted across many iterations of RPMS (lxc, kernels, etc). user@linux$ rpm -qa | egrep 'lxc' lxc-doc-1.0.5-5.fc20.noarch lxc-templates-1.0.5-5.fc20.x86_64 libvirt-daemon-driver-lxc-1.1.3.5-2.fc20.x86_64 python3-lxc-1.0.5-5.fc20.x86_64 clxclient-3.6.1-9.fc20.x86_64 lxc-1.0.5-5.fc20.x86_64 lxc-extra-1.0.5-5.fc20.x86_64 lxc-devel-1.0.5-5.fc20.x86_64 lua-lxc-1.0.5-5.fc20.x86_64 lxc-libs-1.0.5-5.fc20.x86_64 (2) No one paid attention to the former bug, though I pleaded. Also, on this bug, the dropdown did not let me select 'kernel'. I think a collaborative effort (LXC and kernel) is optimal. (3) Every computer (a wide variety of them) we tried to run even just one or two LXCs, all jump drastically in temperature (as you see) -- for an LXC or two that are essentially idle. Correspondingly, all of those computer's FANS think there is a problem because, in each case, they speed up and get noticeably loud. Take this well equipped server: - 64GB RAM @ 2600Mhz - i7 x 3Ghz x 12 Cores - 2TB SSD (RAID-O H/W stripe of 1TB pair) No monitor Host Fedora O/S is optimized to run only what is necessary. Everything is disabled (no 'sendmail', no 'cron'). It's very tight. Running one idle LXC causes a spike in temperature; run two, and the fans start increasing. Yet nothing is really happening. On the other hand, on that very same machine I can run 5 *fully virtualized* CentOS6 KVM guests (on Fedora-20 Host), each with 11GB RAM assigned to them; and on them run distributed Apache Hadoop/HDFS, Apache Spark and Apache Kafka to perform Real-Time distributed Machine Learning -- so those KVMs are truly doing a lot! Yet for the amount of real-time work that that KVM-based cluster is doing, (again, full virtualization now) there is very little increase in temperature, and zero increase in fan speed. Also note that there is an 'overall' temperature LED on the front of that computer. It reads ~320 when Fedora Host is booted up and idle. I can launch those 5 KVMs, and it goes up to about ~340; but launching 1 or 2 *idle* LXCs causes a jump to above ~410 immediately. Why? So it's not just 'sensors -f' output. There are LED and FAN increase indications, too. So something is definitely going on with LXC & Kernel, and because there is, we're assuming the possibility that the temperature jump can be even higher than shown. We have to... -- to protect the systems. I think one of the underlying components used in createing the virtual container is causing a problem (kernel iptables, chroot, resource management, etc.) or maybe a kernel mutex is spinning, or something. But this behavior is definitely problemmatic. Again, we really want to use LXC because we can get better utilization from every server that way. But we are stuck. Thank you again! This is a known existing problem with systemd-journald in a containers. If you look at the CPU time in those container processes, you will notice systemd-journald is in a runaway condition and consuming 100% CPU. If you were to run "top" you would see your load average has shot through the roof and multiple systemd-journald processes are camped out on the CPUs consuming the processors. The problem relates to having /dev/kmsg symlinked to /dev/console in the containers, which is common in a lot of cases with sysvinit or upstart but causes problems with systemd-journald because journald is reading from kmsg and writing to console thus creating a messaging loop which it is then failing to detect. This problem is going to be addressed in some patches to be released shortly for templates supporting systemd based distros and also attempting to intercept the affected containers at startup with default settings. Existing containers running systemd-journald will need to be updated with a couple of minor changes... To address this problem in an affected container... 1) Shut down the container. 2) Edit the container config file and add the following line... lxc.kmsg = 0 3) Remove the existing symlink for the container /dev. Because, for systemd, this is a persistent subdirectory under the /dev/.lxc in the host devtmpfs area, it should be removed like this: rm -f /var/lib/lxc/{container-name}/rootfs.dev/kmsg 4) Restart the container. Hi Michael: Thank you for taking to time to articulate the issue as you did (appreciated!). And there is good new, too. I made the adjustments you prescribed above to each of the 5 LXC containers, started them, and everything looks as expected, including the front-display LED temperature reading (only ~330). root@linux# lxc-ls --active vps00 vps01 vps02 vps03 vps04 root@linux# sensors -f coretemp-isa-0000 Adapter: ISA adapter Physical id 0: +77.0°F (high = +176.0°F, crit = +194.0°F) Core 0: +73.4°F (high = +176.0°F, crit = +194.0°F) Core 1: +77.0°F (high = +176.0°F, crit = +194.0°F) Core 2: +77.0°F (high = +176.0°F, crit = +194.0°F) Core 3: +66.2°F (high = +176.0°F, crit = +194.0°F) Core 4: +71.6°F (high = +176.0°F, crit = +194.0°F) Core 5: +75.2°F (high = +176.0°F, crit = +194.0°F) This is finally SOLVED. \o/ Thank you very much Michael & Thomas. *** Bug 1195945 has been marked as a duplicate of this bug. *** Fixed in commit e8a16654, will be in 1.0.8. This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |