From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020918 Description of problem: While running 9i RAC tests, we were monitoring overall performance, using the top utility. All 4 CPUs were evenly distributing the workload. This was evident by the percentage of user CPU time of all CPUs ranging between 85% and 100%. When we ran "iostat -x 3" to check disk IO performance, the process, ksoftiqrd_CPU0 ran up to 100% system CPU time, and stood at 100% throughout the rest of the test. At this time, CPU1,2,and 3 went to 1-3% user and 1-3% system CPU times. The ksoftirqd_CPU0 process continued to exhibit the same results when starting a second test. This condition was only cleared, when restarting the database. In addition to this, many of the the counters for "iostat -x 3 " (%util, avgqu-sz, avgrq-sz, svctm, etc.) seemed to display cumulative results, not being able to clear themselves and give proper 3 second statistics. Background Information: 2-way, 2.8Ghz server with Hyper Threading turned on * kernel -- 2.4.9-e.8 enterprise.AS2.1 1686 * sysstat 4.0.1 Release 2 * Oracle testing: * Mixed read/write IO on 4 tablespaces, across 4 CPUs Running 60 oracle processes One Oracle Instance The 2 major problems are: 1) CPU0 ends up running at 100% system time, severely impacting performance 2) iostat does not seem to clean it's counters for the specified interval, or ever for that matter. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Start 60 oracle processes, issuing continuous reads/writes to 4 tablespaces. Oracle tests run multiple select & insert/update statements. 2.run iostat during tests 3.top will show CPU0 at 100% system util while CPU1 CPU2 & CPU3 are at .03% Additional info: kernel -- 2.4.9-e.8 enterprise.AS2.1 1686 sysstat 4.0.1 Release 2
This is the same bug really as 83789. I have the same problem with just one telnet session running. Dell 530 dual 2.4ghz 4gb ram ecc
Im unconvinced they are the same thing
6 root 34 19 0 0 0 SWN 0.0 0.0 319:40 ksoftirqd_CPU0 10 root 15 0 0 0 0 SW 0.0 0.0 95:25 kswapd 13 root 15 0 0 0 0 SW 0.0 0.0 168:12 bdflush Linux 2.4.9-e.12enterprise #1 SMP Tue Feb 11 01:29:18 EST 2003 i686 unn This happens in a production environment running Oracle. Every few days the system will become completely unresponsive except for a redimentary functioning of the TCP/IP stack. Connect() returns success but remote host will just idle from that point forward. Server is unresponsive on console until it either returns (anywhere between 5 - 45 minutes, usually after the oracle listener and db have died) or the host is manually powercycled. Note: this is not connected with any orinico problems.
Does the problem still show with recent kernels? Anyway, this sounds like a kernel/scheduling problem to me, even more so because the problem shows not only when iostat runs.
handing off to the kernel group
this is an old one, i think we should start w/reproducing it on the latest rhel2.1 kernel, e.49. thanks.
This is a truely ancient report. If there is no update here in the next two weeks demonstrating this bug on a current 2.1 kernel, this ticket will be closed.