870573 – Abnormally high ksoftirqd CPU usage on CentOS 6.3

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 870573 - Abnormally high ksoftirqd CPU usage on CentOS 6.3

Summary: Abnormally high ksoftirqd CPU usage on CentOS 6.3

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Red Hat Kernel Manager
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-10-26 22:41 UTC by kbergk
Modified:	2018-11-30 20:56 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-04-13 12:02:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
cat /proc/cpuinfo (6.19 KB, text/plain) 2012-10-26 22:41 UTC, kbergk	no flags	Details
lspci (3.04 KB, text/plain) 2012-10-26 22:42 UTC, kbergk	no flags	Details
Simple test of 4 recent kernels (25.53 KB, text/plain) 2012-10-26 22:50 UTC, kbergk	no flags	Details
Simple test on 3.0.48-1.el6.elrepo.x86_64 (6.32 KB, text/plain) 2012-10-27 00:26 UTC, kbergk	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
CentOS	5813	0	None	None	None	2012-10-26 22:41:52 UTC

Description kbergk 2012-10-26 22:41:52 UTC

Created attachment 634086 [details]
cat /proc/cpuinfo

Description of problem:
Seeing higher than usual CPU usage by ksoftirqd in CentOS 6.3 as of kernel 2.6.32-220.13.1.el6.x86_64.  This is on a server with dual Intel Xeon L5420 processors and 5400 chipset.


Version-Release number of selected component (if applicable):
CentOS 6.3, kernels 2.6.32-220.13.1.el6.x86_64 and 2.6.32-279.9.1.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Install kernel 2.6.32-220.13.1.el6.x86_64 on system with Xeon L5420 processor(s).
2. Monitor cpu usage of ksoftirqd.
  
Actual results:
After 10 minutes of heavy CPU load on kernel 2.6.32-220.13.1.el6.x86_64, cpu time of ksoftirqd:
root 4 2 13 20:28 ? 00:00:53 [ksoftirqd/0]
root 9 2 8 20:28 ? 00:00:34 [ksoftirqd/1]
root 13 2 0 20:28 ? 00:00:00 [ksoftirqd/2]
root 17 2 0 20:28 ? 00:00:02 [ksoftirqd/3]
root 21 2 9 20:28 ? 00:00:39 [ksoftirqd/4]
root 25 2 11 20:28 ? 00:00:44 [ksoftirqd/5]
root 29 2 0 20:28 ? 00:00:02 [ksoftirqd/6]
root 33 2 0 20:28 ? 00:00:01 [ksoftirqd/7]


Expected results:
After 10 minutes of heavy CPU load on kernel 2.6.32-220.4.1.el6.x86_64, cpu time of ksoftirqd:
root 4 2 0 19:22 ? 00:00:00 [ksoftirqd/0]
root 9 2 0 19:22 ? 00:00:00 [ksoftirqd/1]
root 13 2 0 19:22 ? 00:00:00 [ksoftirqd/2]
root 17 2 0 19:22 ? 00:00:00 [ksoftirqd/3]
root 21 2 0 19:22 ? 00:00:00 [ksoftirqd/4]
root 25 2 0 19:22 ? 00:00:00 [ksoftirqd/5]
root 29 2 0 19:22 ? 00:00:00 [ksoftirqd/6]
root 33 2 0 19:22 ? 00:00:00 [ksoftirqd/7]

Additional info:
Link to CentOS bug with possibly useful information: http://bugs.centos.org/view.php?id=5813

I also found that this issue is not present in a much newer mainline kernel, 3.6.3-1.el6.elrepo.x86_64.

Comment 1 kbergk 2012-10-26 22:42:17 UTC

Created attachment 634087 [details]
lspci

Comment 3 kbergk 2012-10-26 22:50:32 UTC

Created attachment 634088 [details]
Simple test of 4 recent kernels

I briefly tested the latest mainline kernel (3.6.3-1 as of this writing). I don't see this ksoftirqd issue with 3.6.3-1. I ran some quick tests on various kernels by booting and running 6 cpuburn threads for 10 minutes. Here is the cpu time of the ksoftirqd processes and the output of /proc/interrupts for the following kernels:
2.6.32-220.4.1.el6.x86_64 (unaffected)
2.6.32-220.13.1.el6.x86_64 (affected)
2.6.32-279.9.1.el6.x86_64 (affected)
3.6.3-1.el6.elrepo.x86_64 (unaffected)

Comment 4 Akemi Yagi 2012-10-26 23:11:59 UTC

ELRepo also offers "long-term" kernel (now named kernel-lt). The current version is kernel-lt-3.0.48. Could you try this one so that the target can be narrowed down?

Comment 5 kbergk 2012-10-27 00:26:28 UTC

Created attachment 634099 [details]
Simple test on 3.0.48-1.el6.elrepo.x86_64

It appears that the latest long-term ELRepo kernel, 3.0.48-1.el6.elrepo.x86_64, is not affected.

Comment 6 simon.d.matthews 2012-11-02 02:40:45 UTC

I see the same problem. 

The affected machines are all virtual machines, running Centos 6.3 on an AMD host. The kernel is 2.6.32-279.1.1.el6.centos.plus.x86_64, but other kernel versions were affected. 


The problem happens when the machines are under high load.

Comment 7 Alan Bartlett 2012-11-02 02:54:03 UTC

(In reply to comment #6)
> I see the same problem. 
> 
> The affected machines are all virtual machines, running Centos 6.3 on an AMD
> host. The kernel is 2.6.32-279.1.1.el6.centos.plus.x86_64, but other kernel
> versions were affected. 
> 
> 
> The problem happens when the machines are under high load.

Could you please confirm whether the current long-term support kernel from the ELRepo Project (kernel-lt-3.0.50.el6.elrepo, as of the date of this comment) resolves the issue for you?

Comment 8 simon.d.matthews 2012-11-02 03:02:54 UTC

(In reply to comment #6)
> I see the same problem. 
> 
> The affected machines are all virtual machines, running Centos 6.3 on an AMD
> host. The kernel is 2.6.32-279.1.1.el6.centos.plus.x86_64, but other kernel
> versions were affected. 
> 
> 
> The problem happens when the machines are under high load.

I also see it on another VM running 2.6.32-279.11.1.el6.x86_64. High network I/O load seems to trigger the problem with ksoftirqd

Comment 9 simon.d.matthews 2012-11-02 03:03:50 UTC

(In reply to comment #7)
> (In reply to comment #6)
> > I see the same problem. 
> > 
> > The affected machines are all virtual machines, running Centos 6.3 on an AMD
> > host. The kernel is 2.6.32-279.1.1.el6.centos.plus.x86_64, but other kernel
> > versions were affected. 
> > 
> > 
> > The problem happens when the machines are under high load.
> 
> Could you please confirm whether the current long-term support kernel from
> the ELRepo Project (kernel-lt-3.0.50.el6.elrepo, as of the date of this
> comment) resolves the issue for you?

Difficult. It is a core production machine and I am travelling next week.

Comment 10 Orion Poplawski 2012-11-02 13:01:52 UTC

FWIW - I see this on physical hardware as well as VMs.

Comment 11 kbergk 2012-11-02 16:20:19 UTC

Virt guest? While I do see this issue in my KVM guests, it's also very much apparent on my bare metal physical hosts.  I just want to be clear that this is not limited to VM's. Thanks.

Comment 12 Fredrik Jonsson 2012-11-15 16:29:54 UTC

Perhaps a clue... I have a lot of mixed hardware and only observe this problem on the Intel systems. AMD systems hums along just fine.

Comment 13 Andreas Kasenides 2012-12-03 15:03:40 UTC

Bare metal machine with AMD processor has this problem with 
Linux xxx.xx.cy 2.6.32-279.14.1.el6.x86_64 and Centos 6.3

/etc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 37
model name      : AMD Opteron(tm) Processor 250
stepping        : 1
cpu MHz         : 1800.000
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni
bogomips        : 3607.82
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

Problem goes away with nohz=off on the kernel.

Comment 14 Orion Poplawski 2012-12-03 15:33:30 UTC

In my experience, nohz=off is not without problems as well - it seems that load average calculations are incorrect with that setting.  I've be running the -lt kernels from elrepo with good results.

Comment 15 RHEL Program Management 2012-12-17 06:49:33 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 16 RHEL Program Management 2013-10-14 04:42:19 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 18 Orion Poplawski 2015-04-10 20:00:45 UTC

https://access.redhat.com/solutions/302623 indicates this was fixed in 6.4.  Time to close this?

Comment 19 Prarit Bhargava 2015-04-13 12:02:01 UTC

(In reply to Orion Poplawski from comment #18)
> https://access.redhat.com/solutions/302623 indicates this was fixed in 6.4. 
> Time to close this?

I think so -- I'm closing as current release.

P.

Note You need to log in before you can comment on or make changes to this bug.