From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2 Description of problem: During massive I/O the server becomes very unresponsive (probably due to context switching). The load average is usually 3-4 higher than the number of processes started by me. The following kernel threads - bdflush kupdated kjournald kswapd are taking too much CPU resources and not letting the user processes complete their job. The server pushes under 10 MB/sec I/O with when using a file(s), compare to over 100 MB/sec per controller to the raw devices. I am using ext3 file system with default options. All sysctl are default. I am testing with 2 shelves of MSA30-DB with 2 Smart Array 6402 controllers. Version-Release number of selected component (if applicable): kernel-smp-2.4.21-20.EL How reproducible: Always Steps to Reproduce: 1. Perform massive I/O to a file on ext3 file systems with dd or utility 2. 3. Actual Results: bdflush kupdated kjournald kswapd are taking lots of CPU resources and the server is in constant iowait with very little I/O throughput. Expected Results: The system should not be spending as much time in iowait and have higher I/O throughput Additional info: [root@drtsut10 mm]# uname -a Linux drtsut10.corp.acxiom.net 2.4.21-20.ELsmp #1 SMP Wed Aug 18 20:46:40 EDT 2004 i686 i686 i386 GNU/Linux [root@drtsut10 mm]# cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 8165167104 8146563072 18604032 0 462163968 7419478016 Swap: 4186226688 4096 4186222592 MemTotal: 7973796 kB MemFree: 18168 kB MemShared: 0 kB Buffers: 451332 kB Cached: 7245580 kB SwapCached: 4 kB Active: 646176 kB ActiveAnon: 20916 kB ActiveCache: 625260 kB Inact_dirty: 5443448 kB Inact_laundry: 1439732 kB Inact_clean: 189828 kB Inact_target: 1543836 kB HighTotal: 7208924 kB HighFree: 1260 kB LowTotal: 764872 kB LowFree: 16908 kB SwapTotal: 4088112 kB SwapFree: 4088108 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB [root@drtsut10 mm]# cat /proc/partitions major minor #blocks name rio rmerge rsect ruse wio wmerge wsect wuse running use aveq 106 0 716850296 cciss/c2d0 42192 164987 436882 576160 4285145 115555939 970866272 11619957 73 19279370 16203917 106 1 716844366 cciss/c2d0p1 14695 164432 380778 277720 4169326 104408253 948339260 7593087 73 18930060 11880887 106 16 716850296 cciss/c2d1 41127 164986 428354 537930 3104533 84896091 695214512 26839935 0 9516410 31161245 106 17 716844366 cciss/c2d1p1 13630 164431 372250 238690 2988679 73748440 672687500 22809985 0 9166520 26830405 105 0 716850296 cciss/c1d0 304189 4097161 33990154 10953020 4383990 115403933 930697072 17574543 511 39310972 8964434 105 1 716844366 cciss/c1d0p1 276692 4096590 33934018 10467700 4268203 104256215 908170060 1050513 511 38769522 34907287 105 16 716850296 cciss/c1d1 41126 165003 428386 2299620 3178648 87600508 695247248 18250323 0 26039331 5454997 105 17 716844366 cciss/c1d1p1 13629 164432 372250 1965700 3062870 76452781 672720236 11864833 0 25655051 41693410 104 0 71126640 cciss/c0d0 71103 61337 1051288 737050 3306967 3151526 53164280 17301408 0 15808720 18190568 104 1 101984 cciss/c0d0p1 70 662 1464 150 45 31 152 15140 0 12930 15290 104 2 8388480 cciss/c0d0p2 37 98 426 190 30 22 400 4720 0 2550 4910 104 3 8388480 cciss/c0d0p3 70792 60113 1047234 713010 3269413 2136660 44726632 42513984 0 15764890 423641 104 4 1 cciss/c0d0p4 0 0 0 0 0 0 0 0 0 0 0 104 5 526304 cciss/c0d0p5 38 96 418 330 436 3134 28720 42690 0 16970 43020 104 6 49633184 cciss/c0d0p6 72 98 706 23100 36907 1011651 8407056 17660697 0 137380 17684657 104 7 2044064 cciss/c0d0p7 45 41 424 90 136 28 1320 13890 0 10020 13980 104 8 2044064 cciss/c0d0p8 13 41 168 20 0 0 0 0 0 20 20
[root@drtsut10 mm]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/cciss/c0d0p3 8256824 1685996 6151404 22% / /dev/cciss/c0d0p1 98747 14960 78688 16% /boot /dev/cciss/c0d0p2 8256824 32844 7804556 1% /home none 3986896 0 3986896 0% /dev/shm /dev/cciss/c0d0p5 518000 16444 475244 4% /tmp /dev/cciss/c0d0p6 48853752 32828 46339268 1% /u01 /dev/cciss/c1d0p1 705594876 453898524 215854136 68% /u02 /dev/cciss/c1d1p1 705594876 335905260 333847400 51% /u03 /dev/cciss/c2d0p1 705594876 473578096 196174564 71% /u04 /dev/cciss/c2d1p1 705594876 335905260 333847400 51% /u05
Created attachment 112971 [details] SysRq Show-CPUs.txt
Created attachment 112972 [details] SysRq Show-Memory.txt
Created attachment 112973 [details] SysRq Show-State.txt
Created attachment 112974 [details] Process listing [ps xwo pid,command,wchan, ps -ef, ps aufxmww]
Created attachment 112975 [details] Output from iostat Please note the inf values, nan values, and the large await values. The %util is very high, when there is a value.
Created attachment 112980 [details] Script that creates the I/O This is a simple bash script that uses dd to create the I/O.
Created attachment 112981 [details] Output from the dd-fs-throughput.sh This is the output of the dd-fs-throughput.sh script. The script has not completed yet, but the results should give you pretty good idea of the throughput.
Created attachment 112982 [details] System report produced by /usr/sbin/sysreport
Looks to be either the VM or ext3 - nice backtraces.
N/M. The system is just waiting on IO and not actually hanging in anything, all the tasks are in D state. There is a known performance issue with RHEL3 and 3ware cards in certain RAID modes; that problem is being tracked down. Larry, do you think this case could benefit from the highmem bounce buffering improvement patch you created a while ago ? *** This bug has been marked as a duplicate of 121434 ***
I just read bug 121434 and I don't think that this bug is duplicate. They do have similarities; however, there is too many differences. I use 2x HP Smart Array 6402 controller, I use 2x HP MSA30-DB disk enclosure. All disks are high end Ultra320 SCSI disks. I don't think there is problem with the hardware. I don't think it is a problem with the driver.
Created attachment 113109 [details] Output from dmesg
TEST
Hi, It seems we face same kind of kswapd excessive CPU usage issue on our HP DL360G4 machine. We run RHEL 3 U3, kernel is 2.4.21-20.ELsmp. Machine has 4 GB RAM. It gtar-ed- a massive NFS server set of files into /dev/null. That went well in the early minutes of the transfert, but went to trouble quite quickly. I've killed the tar, but kswapd is still running 100%. Here is the top output of my machine: 19:11:31 up 2 days, 23:38, 2 users, load average: 1.90, 1.40, 1.19 71 processes: 67 sleeping, 4 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 0.0% 0.0% 95.6% 0.0% 0.0% 0.0% 304.0% cpu00 0.0% 0.0% 32.1% 0.0% 0.0% 0.0% 67.8% cpu01 0.0% 0.0% 16.7% 0.0% 0.0% 0.0% 83.2% cpu02 0.0% 0.0% 31.3% 0.0% 0.0% 0.0% 68.6% cpu03 0.0% 0.0% 15.3% 0.0% 0.0% 0.0% 84.6% Mem: 4031032k av, 4006332k used, 24700k free, 0k shrd, 89500k buff 2657584k actv, 520684k in_d, 78228k in_c Swap: 2044072k av, 6776k used, 2037296k free 3240448k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 11 root 25 0 0 0 0 SW 95.4 0.0 3433m 2 kswapd 6 root 15 0 0 0 0 SW 0.1 0.0 16:00 2 keventd 1 root 15 0 512 476 452 S 0.0 0.0 0:08 3 init 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0 We also face same kind of issue with HP XW4100 boxes when IO activity was high (diff -r on local disks). Bye.
I have another instance of this: kswapd excessive CPU usage HP DL560G1 with 4 GB RAM and (1) 2GB SWAP and (4) Xeon 2.8GHz HP 6402 controller to HP MSA500G2 SmartArray RHES3 with 2.4.21-20.ELsmp and latest COMPAQ patches I cp a tree from one LV on the array to another on the array and after about 10 minutes, kswapd hits the "top" and reboot is the only way out. I can still get in (just) to do shutdown -r now and non-shell access (Oracle Apps/SQL*Net) performs just fine. I dropped pagecache to "2 30 50" but kswapd still rises to the top. Mike
up2date fixed this problem
Mike, which package fixed the issue. You cann't say to up2date fixed it. Did you have the same issue as me, or just look alike.
Mike, which package fixed the issue. You cann't say just up2date fixes it. What packages that up2date installed fixed the problem. Specify old and new version. Did you have the same issue as me, or just look alike.
Please try a later RHEL3 kernel. The RHEL3-U3 kernel had a bug that prevented kswapd from reclaiming inodes therefore lowmem became totally consumed and many of the system daemons chewed up lots of CPU time trying to recover forom this. The fix for this bug was included in RHEL3-U4(kernel-2.4.21-27.EL), please try the latest RHEL3-U5 kernel and let me know it it fixes the problem. Larry Woodman
I am having the same problem on a server running RHEL AS 3 that has a 1.5TB ext3 filesystem and 4 1.5TB reiserfs filesystems, 2 GB of RAM and 2 xenon CPUs that are hyperthreaded. When doing any intensive disk IO on the ext3 filesystem (cp, rsync, etc.) kswapd begins to use about 25% of the CPU. Further, about 50% of the CPU power is consumed under the system portion of top. During this state no swap is used and the system has plenty of free RAM. Eventually, (overnight) 100% of the system resources are used. At that point text is logged to the console that an out of memory state exists and that pids are killed. The only way to recover is to kill the power. This does not happen when disk IO is done on the reiserfs filesystems. Further, this condition did not exist in the past. Because the data is not used as much on the ext3 filesystem I'm not sure when the problem cropped up. It seems within the last few weeks. The server is up2date on everything as of today (2005-05-04). I am currently running kernel 2.4.21-27.0.4.ELsmp However, I have also tried the non-smp version of the same kernel and the original AS 3 kernel (2.4.21-4.ELsmp). The same problem results with both kernels. Please let me know if there is any more information needed from me. This is a major problem for us as our data cannot be removed from the ext3 filesystem and it cannot really be accessed either. Keith Brautigam p.s. I am willing to try new kernels etc. if needed. Thanks!
I have compiled and installed the stock 2.4.3 kernel, which has allowed me to read and write to and from my ext3 partition without kswapd activating and consuming all my cpu resources in the system categories. I used the .config file from the kernel-2.4.21-i686-smp that is included with the kernel-source package when compiling the new kernel. I said 'no' to all of the new modules. In the past when the machine became unusable a message was printed to the console warning that PIDs were being killed because lowmem = 0. With the new kernel I have not been able to reproduce this problem. When I have a chance I will reboot with the most current redhat kernel and see if the problem occurs with the same partition (now formatted as reiserfs) just to make sure it was not the module from my RAID card that made the difference vs something related to having an ext3 filesystem. Keith
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.