Bug 870573

Summary:

Abnormally high ksoftirqd CPU usage on CentOS 6.3

Product:

Red Hat Enterprise Linux 6

Reporter:

kbergk

Component:

kernel

Assignee:

Red Hat Kernel Manager <kernel-mgr>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

6.3

CC:

abhalla, ajb, ank, david, deatrich, dgregor, fredrik, iain.t.morris, jonathansteffan, mihai, orion, pasteur, prarit, roland.friedwagner, sam, shawn.siefkas, simon.d.matthews, suren, toracat

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-04-13 12:02:01 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
cat /proc/cpuinfo	none
lspci	none
Simple test of 4 recent kernels	none
Simple test on 3.0.48-1.el6.elrepo.x86_64	none

Description kbergk 2012-10-26 22:41:52 UTC

Created attachment 634086 [details]
cat /proc/cpuinfo

Description of problem:
Seeing higher than usual CPU usage by ksoftirqd in CentOS 6.3 as of kernel 2.6.32-220.13.1.el6.x86_64.  This is on a server with dual Intel Xeon L5420 processors and 5400 chipset.


Version-Release number of selected component (if applicable):
CentOS 6.3, kernels 2.6.32-220.13.1.el6.x86_64 and 2.6.32-279.9.1.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Install kernel 2.6.32-220.13.1.el6.x86_64 on system with Xeon L5420 processor(s).
2. Monitor cpu usage of ksoftirqd.
  
Actual results:
After 10 minutes of heavy CPU load on kernel 2.6.32-220.13.1.el6.x86_64, cpu time of ksoftirqd:
root 4 2 13 20:28 ? 00:00:53 [ksoftirqd/0]
root 9 2 8 20:28 ? 00:00:34 [ksoftirqd/1]
root 13 2 0 20:28 ? 00:00:00 [ksoftirqd/2]
root 17 2 0 20:28 ? 00:00:02 [ksoftirqd/3]
root 21 2 9 20:28 ? 00:00:39 [ksoftirqd/4]
root 25 2 11 20:28 ? 00:00:44 [ksoftirqd/5]
root 29 2 0 20:28 ? 00:00:02 [ksoftirqd/6]
root 33 2 0 20:28 ? 00:00:01 [ksoftirqd/7]


Expected results:
After 10 minutes of heavy CPU load on kernel 2.6.32-220.4.1.el6.x86_64, cpu time of ksoftirqd:
root 4 2 0 19:22 ? 00:00:00 [ksoftirqd/0]
root 9 2 0 19:22 ? 00:00:00 [ksoftirqd/1]
root 13 2 0 19:22 ? 00:00:00 [ksoftirqd/2]
root 17 2 0 19:22 ? 00:00:00 [ksoftirqd/3]
root 21 2 0 19:22 ? 00:00:00 [ksoftirqd/4]
root 25 2 0 19:22 ? 00:00:00 [ksoftirqd/5]
root 29 2 0 19:22 ? 00:00:00 [ksoftirqd/6]
root 33 2 0 19:22 ? 00:00:00 [ksoftirqd/7]

Additional info:
Link to CentOS bug with possibly useful information: http://bugs.centos.org/view.php?id=5813

I also found that this issue is not present in a much newer mainline kernel, 3.6.3-1.el6.elrepo.x86_64.

Comment 1 kbergk 2012-10-26 22:42:17 UTC

Created attachment 634087 [details]
lspci

Comment 3 kbergk 2012-10-26 22:50:32 UTC

Created attachment 634088 [details]
Simple test of 4 recent kernels

I briefly tested the latest mainline kernel (3.6.3-1 as of this writing). I don't see this ksoftirqd issue with 3.6.3-1. I ran some quick tests on various kernels by booting and running 6 cpuburn threads for 10 minutes. Here is the cpu time of the ksoftirqd processes and the output of /proc/interrupts for the following kernels:
2.6.32-220.4.1.el6.x86_64 (unaffected)
2.6.32-220.13.1.el6.x86_64 (affected)
2.6.32-279.9.1.el6.x86_64 (affected)
3.6.3-1.el6.elrepo.x86_64 (unaffected)

Comment 4 Akemi Yagi 2012-10-26 23:11:59 UTC

ELRepo also offers "long-term" kernel (now named kernel-lt). The current version is kernel-lt-3.0.48. Could you try this one so that the target can be narrowed down?

Comment 5 kbergk 2012-10-27 00:26:28 UTC

Created attachment 634099 [details]
Simple test on 3.0.48-1.el6.elrepo.x86_64

It appears that the latest long-term ELRepo kernel, 3.0.48-1.el6.elrepo.x86_64, is not affected.

Comment 6 simon.d.matthews 2012-11-02 02:40:45 UTC

I see the same problem. 

The affected machines are all virtual machines, running Centos 6.3 on an AMD host. The kernel is 2.6.32-279.1.1.el6.centos.plus.x86_64, but other kernel versions were affected. 


The problem happens when the machines are under high load.

Comment 7 Alan Bartlett 2012-11-02 02:54:03 UTC

(In reply to comment #6)
> I see the same problem. 
> 
> The affected machines are all virtual machines, running Centos 6.3 on an AMD
> host. The kernel is 2.6.32-279.1.1.el6.centos.plus.x86_64, but other kernel
> versions were affected. 
> 
> 
> The problem happens when the machines are under high load.

Could you please confirm whether the current long-term support kernel from the ELRepo Project (kernel-lt-3.0.50.el6.elrepo, as of the date of this comment) resolves the issue for you?

Comment 8 simon.d.matthews 2012-11-02 03:02:54 UTC

(In reply to comment #6)
> I see the same problem. 
> 
> The affected machines are all virtual machines, running Centos 6.3 on an AMD
> host. The kernel is 2.6.32-279.1.1.el6.centos.plus.x86_64, but other kernel
> versions were affected. 
> 
> 
> The problem happens when the machines are under high load.

I also see it on another VM running 2.6.32-279.11.1.el6.x86_64. High network I/O load seems to trigger the problem with ksoftirqd

Comment 9 simon.d.matthews 2012-11-02 03:03:50 UTC

(In reply to comment #7)
> (In reply to comment #6)
> > I see the same problem. 
> > 
> > The affected machines are all virtual machines, running Centos 6.3 on an AMD
> > host. The kernel is 2.6.32-279.1.1.el6.centos.plus.x86_64, but other kernel
> > versions were affected. 
> > 
> > 
> > The problem happens when the machines are under high load.
> 
> Could you please confirm whether the current long-term support kernel from
> the ELRepo Project (kernel-lt-3.0.50.el6.elrepo, as of the date of this
> comment) resolves the issue for you?

Difficult. It is a core production machine and I am travelling next week.

Comment 10 Orion Poplawski 2012-11-02 13:01:52 UTC

FWIW - I see this on physical hardware as well as VMs.

Comment 11 kbergk 2012-11-02 16:20:19 UTC

Virt guest? While I do see this issue in my KVM guests, it's also very much apparent on my bare metal physical hosts.  I just want to be clear that this is not limited to VM's. Thanks.

Comment 12 Fredrik Jonsson 2012-11-15 16:29:54 UTC

Perhaps a clue... I have a lot of mixed hardware and only observe this problem on the Intel systems. AMD systems hums along just fine.

Comment 13 Andreas Kasenides 2012-12-03 15:03:40 UTC

Bare metal machine with AMD processor has this problem with 
Linux xxx.xx.cy 2.6.32-279.14.1.el6.x86_64 and Centos 6.3

/etc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 37
model name      : AMD Opteron(tm) Processor 250
stepping        : 1
cpu MHz         : 1800.000
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni
bogomips        : 3607.82
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

Problem goes away with nohz=off on the kernel.

Comment 14 Orion Poplawski 2012-12-03 15:33:30 UTC

In my experience, nohz=off is not without problems as well - it seems that load average calculations are incorrect with that setting.  I've be running the -lt kernels from elrepo with good results.

Comment 15 RHEL Program Management 2012-12-17 06:49:33 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 16 RHEL Program Management 2013-10-14 04:42:19 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 18 Orion Poplawski 2015-04-10 20:00:45 UTC

https://access.redhat.com/solutions/302623 indicates this was fixed in 6.4.  Time to close this?

Comment 19 Prarit Bhargava 2015-04-13 12:02:01 UTC

(In reply to Orion Poplawski from comment #18)
> https://access.redhat.com/solutions/302623 indicates this was fixed in 6.4. 
> Time to close this?

I think so -- I'm closing as current release.

P.