Red Hat Bugzilla – Bug 80279
ksoftirqd_CPU0 hits 100% when running iostat
Last modified: 2007-11-30 17:06:52 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020918
Description of problem:
While running 9i RAC tests, we were monitoring overall performance, using the
top utility. All 4 CPUs were evenly distributing the workload. This was evident
by the percentage of user CPU time of all CPUs ranging between 85% and 100%.
When we ran "iostat -x 3" to check disk IO performance, the process,
ksoftiqrd_CPU0 ran up to 100% system CPU time, and stood at 100% throughout the
rest of the test. At this time, CPU1,2,and 3 went to 1-3% user and 1-3% system
CPU times. The ksoftirqd_CPU0 process continued to exhibit the same results
when starting a second test. This condition was only cleared, when restarting
the database. In addition to this, many of the the counters for "iostat -x 3 "
(%util, avgqu-sz, avgrq-sz, svctm, etc.) seemed to display cumulative results,
not being able to clear themselves and give proper 3 second statistics.
2-way, 2.8Ghz server with Hyper Threading turned on
* kernel -- 2.4.9-e.8 enterprise.AS2.1 1686
* sysstat 4.0.1 Release 2
* Oracle testing:
* Mixed read/write IO on 4 tablespaces, across 4 CPUs
Running 60 oracle processes
One Oracle Instance
The 2 major problems are:
1) CPU0 ends up running at 100% system time, severely impacting performance
2) iostat does not seem to clean it's counters for the specified interval, or
ever for that matter.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Start 60 oracle processes, issuing continuous reads/writes to 4 tablespaces.
Oracle tests run multiple select & insert/update statements.
2.run iostat during tests
3.top will show CPU0 at 100% system util while CPU1 CPU2 & CPU3 are at .03%
kernel -- 2.4.9-e.8 enterprise.AS2.1 1686
sysstat 4.0.1 Release 2
This is the same bug really as 83789. I have the same problem with just one
telnet session running.
Dell 530 dual 2.4ghz
4gb ram ecc
Im unconvinced they are the same thing
6 root 34 19 0 0 0 SWN 0.0 0.0 319:40 ksoftirqd_CPU0
10 root 15 0 0 0 0 SW 0.0 0.0 95:25 kswapd
13 root 15 0 0 0 0 SW 0.0 0.0 168:12 bdflush
Linux 2.4.9-e.12enterprise #1 SMP Tue Feb 11 01:29:18 EST 2003 i686 unn
This happens in a production environment running Oracle. Every few days the
system will become completely unresponsive except for a redimentary functioning
of the TCP/IP stack. Connect() returns success but remote host will just idle
from that point forward. Server is unresponsive on console until it either
returns (anywhere between 5 - 45 minutes, usually after the oracle listener and
db have died) or the host is manually powercycled.
Note: this is not connected with any orinico problems.
Does the problem still show with recent kernels?
Anyway, this sounds like a kernel/scheduling problem to me, even more
so because the problem shows not only when iostat runs.
handing off to the kernel group
this is an old one, i think we should start w/reproducing it on the
latest rhel2.1 kernel, e.49. thanks.
This is a truely ancient report. If there is no update here in the next two
weeks demonstrating this bug on a current 2.1 kernel, this ticket will be closed.