Bug 870573
Summary: | Abnormally high ksoftirqd CPU usage on CentOS 6.3 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | kbergk | ||||||||||
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 6.3 | CC: | abhalla, ajb, ank, david, deatrich, dgregor, fredrik, iain.t.morris, jonathansteffan, mihai, orion, pasteur, prarit, roland.friedwagner, sam, shawn.siefkas, simon.d.matthews, suren, toracat | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2015-04-13 12:02:01 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
kbergk
2012-10-26 22:41:52 UTC
Created attachment 634087 [details]
lspci
Created attachment 634088 [details]
Simple test of 4 recent kernels
I briefly tested the latest mainline kernel (3.6.3-1 as of this writing). I don't see this ksoftirqd issue with 3.6.3-1. I ran some quick tests on various kernels by booting and running 6 cpuburn threads for 10 minutes. Here is the cpu time of the ksoftirqd processes and the output of /proc/interrupts for the following kernels:
2.6.32-220.4.1.el6.x86_64 (unaffected)
2.6.32-220.13.1.el6.x86_64 (affected)
2.6.32-279.9.1.el6.x86_64 (affected)
3.6.3-1.el6.elrepo.x86_64 (unaffected)
ELRepo also offers "long-term" kernel (now named kernel-lt). The current version is kernel-lt-3.0.48. Could you try this one so that the target can be narrowed down? Created attachment 634099 [details]
Simple test on 3.0.48-1.el6.elrepo.x86_64
It appears that the latest long-term ELRepo kernel, 3.0.48-1.el6.elrepo.x86_64, is not affected.
I see the same problem. The affected machines are all virtual machines, running Centos 6.3 on an AMD host. The kernel is 2.6.32-279.1.1.el6.centos.plus.x86_64, but other kernel versions were affected. The problem happens when the machines are under high load. (In reply to comment #6) > I see the same problem. > > The affected machines are all virtual machines, running Centos 6.3 on an AMD > host. The kernel is 2.6.32-279.1.1.el6.centos.plus.x86_64, but other kernel > versions were affected. > > > The problem happens when the machines are under high load. Could you please confirm whether the current long-term support kernel from the ELRepo Project (kernel-lt-3.0.50.el6.elrepo, as of the date of this comment) resolves the issue for you? (In reply to comment #6) > I see the same problem. > > The affected machines are all virtual machines, running Centos 6.3 on an AMD > host. The kernel is 2.6.32-279.1.1.el6.centos.plus.x86_64, but other kernel > versions were affected. > > > The problem happens when the machines are under high load. I also see it on another VM running 2.6.32-279.11.1.el6.x86_64. High network I/O load seems to trigger the problem with ksoftirqd (In reply to comment #7) > (In reply to comment #6) > > I see the same problem. > > > > The affected machines are all virtual machines, running Centos 6.3 on an AMD > > host. The kernel is 2.6.32-279.1.1.el6.centos.plus.x86_64, but other kernel > > versions were affected. > > > > > > The problem happens when the machines are under high load. > > Could you please confirm whether the current long-term support kernel from > the ELRepo Project (kernel-lt-3.0.50.el6.elrepo, as of the date of this > comment) resolves the issue for you? Difficult. It is a core production machine and I am travelling next week. FWIW - I see this on physical hardware as well as VMs. Virt guest? While I do see this issue in my KVM guests, it's also very much apparent on my bare metal physical hosts. I just want to be clear that this is not limited to VM's. Thanks. Perhaps a clue... I have a lot of mixed hardware and only observe this problem on the Intel systems. AMD systems hums along just fine. Bare metal machine with AMD processor has this problem with Linux xxx.xx.cy 2.6.32-279.14.1.el6.x86_64 and Centos 6.3 /etc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 37 model name : AMD Opteron(tm) Processor 250 stepping : 1 cpu MHz : 1800.000 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni bogomips : 3607.82 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp Problem goes away with nohz=off on the kernel. In my experience, nohz=off is not without problems as well - it seems that load average calculations are incorrect with that setting. I've be running the -lt kernels from elrepo with good results. This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. https://access.redhat.com/solutions/302623 indicates this was fixed in 6.4. Time to close this? (In reply to Orion Poplawski from comment #18) > https://access.redhat.com/solutions/302623 indicates this was fixed in 6.4. > Time to close this? I think so -- I'm closing as current release. P. |