80279 – ksoftirqd_CPU0 hits 100% when running iostat

Bug 80279 - ksoftirqd_CPU0 hits 100% when running iostat

Summary: ksoftirqd_CPU0 hits 100% when running iostat

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	2.1
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Don Howard
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-12-23 22:08 UTC by Anthony Marusic
Modified:	2007-11-30 22:06 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-07-26 18:12:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Anthony Marusic 2002-12-23 22:08:21 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020918

Description of problem:


While running 9i RAC tests, we were monitoring overall performance, using the
top utility.  All 4 CPUs were evenly distributing the workload. This was evident
by the percentage of  user CPU time of all CPUs ranging between 85% and 100%. 
When we ran "iostat -x 3" to check disk IO performance, the process,
ksoftiqrd_CPU0 ran up to 100% system CPU time, and stood at 100% throughout the
rest of the test.  At this time, CPU1,2,and 3 went to 1-3% user and 1-3% system
CPU times.  The ksoftirqd_CPU0 process continued to exhibit the same results
when starting a second test.  This condition was only cleared, when restarting
the database.  In addition to this, many of the the counters for "iostat -x 3 "
(%util, avgqu-sz, avgrq-sz, svctm, etc.) seemed to display cumulative results,
not being able to clear themselves and give proper 3 second statistics.

Background Information: 
2-way, 2.8Ghz server with Hyper Threading turned on 
      * kernel -- 2.4.9-e.8 enterprise.AS2.1 1686
      * sysstat 4.0.1 Release 2 
      * Oracle testing: 
      * Mixed read/write IO on 4 tablespaces, across 4 CPUs 
                Running 60 oracle processes 
                One Oracle Instance 

The 2 major problems are:

1) CPU0 ends up running at 100% system time, severely impacting performance

2) iostat does not seem to clean it's counters for the specified interval, or
ever for that matter.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Start 60 oracle processes, issuing continuous reads/writes to 4 tablespaces.
Oracle tests run multiple select & insert/update statements.
2.run iostat during tests
3.top will show CPU0 at 100% system util while CPU1 CPU2 & CPU3 are at .03%
    

Additional info:

kernel -- 2.4.9-e.8 enterprise.AS2.1 1686
sysstat 4.0.1 Release 2

Comment 1 jeffrey.buchsbaum 2003-02-13 17:32:31 UTC

This is the same bug really as 83789.  I have the same problem with just one
telnet session running.

Dell 530 dual 2.4ghz
4gb ram ecc

Comment 2 Alan Cox 2003-06-05 14:51:14 UTC

Im unconvinced they are the same thing

Comment 3 edward dertouzas 2003-07-27 05:32:21 UTC

    6 root      34  19     0    0     0 SWN   0.0  0.0 319:40 ksoftirqd_CPU0
   10 root      15   0     0    0     0 SW    0.0  0.0  95:25 kswapd
   13 root      15   0     0    0     0 SW    0.0  0.0 168:12 bdflush

Linux 2.4.9-e.12enterprise #1 SMP Tue Feb 11 01:29:18 EST 2003 i686 unn

This happens in a production environment running Oracle. Every few days the 
system will become completely unresponsive except for a redimentary functioning 
of the TCP/IP stack. Connect() returns success but remote host will just idle 
from that point forward. Server is unresponsive on console until it either 
returns (anywhere between 5 - 45 minutes, usually after the oracle listener and 
db have died) or the host is manually powercycled.

Note: this is not connected with any orinico problems.

Comment 4 Nils Philippsen 2004-01-13 15:38:13 UTC

Does the problem still show with recent kernels?

Anyway, this sounds like a kernel/scheduling problem to me, even more
so because the problem shows not only when iostat runs.

Comment 5 Charlie Bennett 2004-09-23 19:29:43 UTC

handing off to the kernel group

Comment 6 Jason Baron 2004-09-23 19:46:23 UTC

this is an old one, i think we should start w/reproducing it on the
latest rhel2.1 kernel, e.49. thanks.

Comment 7 Don Howard 2006-03-16 21:57:13 UTC

This is a truely ancient report.  If there is no update here in the next two
weeks demonstrating this bug on a current 2.1 kernel, this ticket will be closed.

Note You need to log in before you can comment on or make changes to this bug.