154441 – bdflush kupdated kjournald kswapd taking all CPU resources; iowait is 100%; lot's of processes are in schedule_timeout function

Bug 154441 - bdflush kupdated kjournald kswapd taking all CPU resources; iowait is 100%; lot's of processes are in schedule_timeout function

Summary: bdflush kupdated kjournald kswapd taking all CPU resources; iowait is 100%; l...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-04-11 18:15 UTC by Georgi Hristov
Modified:	2007-11-30 22:07 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-10-19 19:04:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
SysRq Show-CPUs.txt (68.03 KB, text/plain) 2005-04-11 18:20 UTC, Georgi Hristov	no flags	Details
SysRq Show-Memory.txt (43.20 KB, text/plain) 2005-04-11 18:20 UTC, Georgi Hristov	no flags	Details
SysRq Show-State.txt (80.82 KB, text/plain) 2005-04-11 18:21 UTC, Georgi Hristov	no flags	Details
Process listing [ps xwo pid,command,wchan, ps -ef, ps aufxmww] (17.41 KB, text/plain) 2005-04-11 18:22 UTC, Georgi Hristov	no flags	Details
Output from iostat (34.96 KB, text/plain) 2005-04-11 18:27 UTC, Georgi Hristov	no flags	Details
Script that creates the I/O (3.04 KB, text/plain) 2005-04-11 19:00 UTC, Georgi Hristov	no flags	Details
Output from the dd-fs-throughput.sh (28.63 KB, text/plain) 2005-04-11 19:04 UTC, Georgi Hristov	no flags	Details
System report produced by /usr/sbin/sysreport (396.78 KB, application/octet-stream) 2005-04-11 19:17 UTC, Georgi Hristov	no flags	Details
Output from dmesg (22.86 KB, text/plain) 2005-04-13 18:11 UTC, Georgi Hristov	no flags	Details
View All

Description Georgi Hristov 2005-04-11 18:15:26 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2

Description of problem:
During massive I/O the server becomes very unresponsive (probably due to context switching). The load average is usually 3-4 higher than the number of processes started by me. The following kernel threads - bdflush kupdated kjournald kswapd are taking too much CPU resources and not letting the user processes complete their job. The server pushes under 10 MB/sec I/O with when using a file(s), compare to over 100 MB/sec per controller to the raw devices. I am using ext3 file system with default options. All sysctl are default.

I am testing with 2 shelves of MSA30-DB with 2 Smart Array 6402 controllers. 

Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-20.EL

How reproducible:
Always

Steps to Reproduce:
1. Perform massive I/O to a file on ext3 file systems with dd or utility
2.
3.
  

Actual Results:  bdflush kupdated kjournald kswapd are taking lots of CPU resources and the server is in constant iowait with very little I/O throughput.

Expected Results:  The system should not be spending as much time in iowait and have higher I/O throughput

Additional info:

[root@drtsut10 mm]# uname -a
Linux drtsut10.corp.acxiom.net 2.4.21-20.ELsmp #1 SMP Wed Aug 18 20:46:40 EDT 2004 i686 i686 i386 GNU/Linux
[root@drtsut10 mm]# cat /proc/meminfo
        total:    used:    free:  shared: buffers:  cached:
Mem:  8165167104 8146563072 18604032        0 462163968 7419478016
Swap: 4186226688     4096 4186222592
MemTotal:      7973796 kB
MemFree:         18168 kB
MemShared:           0 kB
Buffers:        451332 kB
Cached:        7245580 kB
SwapCached:          4 kB
Active:         646176 kB
ActiveAnon:      20916 kB
ActiveCache:    625260 kB
Inact_dirty:   5443448 kB
Inact_laundry: 1439732 kB
Inact_clean:    189828 kB
Inact_target:  1543836 kB
HighTotal:     7208924 kB
HighFree:         1260 kB
LowTotal:       764872 kB
LowFree:         16908 kB
SwapTotal:     4088112 kB
SwapFree:      4088108 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB
[root@drtsut10 mm]# cat /proc/partitions
major minor  #blocks  name     rio rmerge rsect ruse wio wmerge wsect wuse running use aveq

 106     0  716850296 cciss/c2d0 42192 164987 436882 576160 4285145 115555939 970866272 11619957 73 19279370 16203917
 106     1  716844366 cciss/c2d0p1 14695 164432 380778 277720 4169326 104408253 948339260 7593087 73 18930060 11880887
 106    16  716850296 cciss/c2d1 41127 164986 428354 537930 3104533 84896091 695214512 26839935 0 9516410 31161245
 106    17  716844366 cciss/c2d1p1 13630 164431 372250 238690 2988679 73748440 672687500 22809985 0 9166520 26830405
 105     0  716850296 cciss/c1d0 304189 4097161 33990154 10953020 4383990 115403933 930697072 17574543 511 39310972 8964434
 105     1  716844366 cciss/c1d0p1 276692 4096590 33934018 10467700 4268203 104256215 908170060 1050513 511 38769522 34907287
 105    16  716850296 cciss/c1d1 41126 165003 428386 2299620 3178648 87600508 695247248 18250323 0 26039331 5454997
 105    17  716844366 cciss/c1d1p1 13629 164432 372250 1965700 3062870 76452781 672720236 11864833 0 25655051 41693410
 104     0   71126640 cciss/c0d0 71103 61337 1051288 737050 3306967 3151526 53164280 17301408 0 15808720 18190568
 104     1     101984 cciss/c0d0p1 70 662 1464 150 45 31 152 15140 0 12930 15290
 104     2    8388480 cciss/c0d0p2 37 98 426 190 30 22 400 4720 0 2550 4910
 104     3    8388480 cciss/c0d0p3 70792 60113 1047234 713010 3269413 2136660 44726632 42513984 0 15764890 423641
 104     4          1 cciss/c0d0p4 0 0 0 0 0 0 0 0 0 0 0
 104     5     526304 cciss/c0d0p5 38 96 418 330 436 3134 28720 42690 0 16970 43020
 104     6   49633184 cciss/c0d0p6 72 98 706 23100 36907 1011651 8407056 17660697 0 137380 17684657
 104     7    2044064 cciss/c0d0p7 45 41 424 90 136 28 1320 13890 0 10020 13980
 104     8    2044064 cciss/c0d0p8 13 41 168 20 0 0 0 0 0 20 20

Comment 1 Georgi Hristov 2005-04-11 18:17:42 UTC

[root@drtsut10 mm]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/cciss/c0d0p3      8256824   1685996   6151404  22% /
/dev/cciss/c0d0p1        98747     14960     78688  16% /boot
/dev/cciss/c0d0p2      8256824     32844   7804556   1% /home
none                   3986896         0   3986896   0% /dev/shm
/dev/cciss/c0d0p5       518000     16444    475244   4% /tmp
/dev/cciss/c0d0p6     48853752     32828  46339268   1% /u01
/dev/cciss/c1d0p1    705594876 453898524 215854136  68% /u02
/dev/cciss/c1d1p1    705594876 335905260 333847400  51% /u03
/dev/cciss/c2d0p1    705594876 473578096 196174564  71% /u04
/dev/cciss/c2d1p1    705594876 335905260 333847400  51% /u05

Comment 2 Georgi Hristov 2005-04-11 18:20:06 UTC

Created attachment 112971 [details]
SysRq Show-CPUs.txt

Comment 3 Georgi Hristov 2005-04-11 18:20:43 UTC

Created attachment 112972 [details]
SysRq Show-Memory.txt

Comment 4 Georgi Hristov 2005-04-11 18:21:15 UTC

Created attachment 112973 [details]
SysRq Show-State.txt

Comment 5 Georgi Hristov 2005-04-11 18:22:39 UTC

Created attachment 112974 [details]
Process listing [ps xwo pid,command,wchan, ps -ef, ps aufxmww]

Comment 6 Georgi Hristov 2005-04-11 18:27:22 UTC

Created attachment 112975 [details]
Output from iostat 

Please note the inf values, nan values, and the large await values. The %util
is very high, when there is a value.

Comment 7 Georgi Hristov 2005-04-11 19:00:51 UTC

Created attachment 112980 [details]
Script that creates the I/O

This is a simple bash script that uses dd to  create the I/O.

Comment 8 Georgi Hristov 2005-04-11 19:04:15 UTC

Created attachment 112981 [details]
Output from the dd-fs-throughput.sh 

This is the output of the dd-fs-throughput.sh script. The script has not
completed yet, but the results should give you pretty good idea of the
throughput.

Comment 9 Georgi Hristov 2005-04-11 19:17:57 UTC

Created attachment 112982 [details]
System report produced by /usr/sbin/sysreport

Comment 12 Rik van Riel 2005-04-12 20:27:55 UTC

Looks to be either the VM or ext3 - nice backtraces.

Comment 13 Rik van Riel 2005-04-13 14:55:08 UTC

N/M.  The system is just waiting on IO and not actually hanging in anything, all
the tasks are in D state.

There is a known performance issue with RHEL3 and 3ware cards in certain RAID
modes; that problem is being tracked down.

Larry, do you think this case could benefit from the highmem bounce buffering
improvement patch you created a while ago ?

*** This bug has been marked as a duplicate of 121434 ***

Comment 14 Georgi Hristov 2005-04-13 18:09:29 UTC

I just read bug 121434 and I don't think that this bug is duplicate. They do
have similarities; however, there is too many differences. I use 2x HP Smart
Array 6402 controller, I use 2x HP MSA30-DB disk enclosure. All disks are high
end Ultra320 SCSI disks. I don't think there is problem with the hardware. I
don't think it is a problem with the driver.

Comment 15 Georgi Hristov 2005-04-13 18:11:30 UTC

Created attachment 113109 [details]
Output from dmesg

Comment 16 Laurent Aymeric 2005-04-15 17:13:20 UTC

TEST

Comment 17 Laurent Aymeric 2005-04-15 17:19:04 UTC

Hi,

It seems we face same kind of kswapd excessive CPU usage issue on our HP DL360G4
machine. We run RHEL 3 U3, kernel is 2.4.21-20.ELsmp. Machine has 4 GB RAM.

It gtar-ed- a massive NFS server set of files into /dev/null. That went well in
the early minutes of the transfert, but went to trouble quite quickly. I've
killed the tar, but kswapd is still running 100%. Here is the top output of my
machine:

 19:11:31  up 2 days, 23:38,  2 users,  load average: 1.90, 1.40, 1.19
71 processes: 67 sleeping, 4 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    0.0%    0.0%   95.6%   0.0%     0.0%    0.0%  304.0%
           cpu00    0.0%    0.0%   32.1%   0.0%     0.0%    0.0%   67.8%
           cpu01    0.0%    0.0%   16.7%   0.0%     0.0%    0.0%   83.2%
           cpu02    0.0%    0.0%   31.3%   0.0%     0.0%    0.0%   68.6%
           cpu03    0.0%    0.0%   15.3%   0.0%     0.0%    0.0%   84.6%
Mem:  4031032k av, 4006332k used,   24700k free,       0k shrd,   89500k buff
                   2657584k actv,  520684k in_d,   78228k in_c
Swap: 2044072k av,    6776k used, 2037296k free                 3240448k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
   11 root      25   0     0    0     0 SW   95.4  0.0  3433m   2 kswapd
    6 root      15   0     0    0     0 SW    0.1  0.0  16:00   2 keventd
    1 root      15   0   512  476   452 S     0.0  0.0   0:08   3 init
    2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0 migration/0


We also face same kind of issue with HP XW4100 boxes when IO activity was high
(diff -r on local disks).
Bye.

Comment 18 Mike King 2005-04-19 00:40:14 UTC

I have another instance of this:
kswapd excessive CPU usage
HP DL560G1 with 4 GB RAM and (1) 2GB SWAP and (4) Xeon 2.8GHz
HP 6402 controller to HP MSA500G2 SmartArray
RHES3 with 2.4.21-20.ELsmp and latest COMPAQ patches
I cp a tree from one LV on the array to another on the array and after about 10
minutes, kswapd hits the "top" and reboot is the only way out.  I can still get
in (just) to do shutdown -r now and non-shell access (Oracle Apps/SQL*Net)
performs just fine.  I dropped pagecache to "2 30 50" but kswapd still rises to
the top.
Mike

Comment 19 Mike King 2005-04-20 19:37:33 UTC

up2date fixed this problem

Comment 20 Georgi Hristov 2005-04-20 22:58:16 UTC

Mike,

which package fixed the issue. You cann't say to up2date fixed it. Did you have
the same issue as me, or just look alike.

Comment 21 Georgi Hristov 2005-04-20 23:00:03 UTC

Mike,

which package fixed the issue. You cann't say just up2date fixes it. What
packages that up2date installed fixed the problem. Specify old and new version.
 Did you have the same issue as me, or just look alike.

Comment 22 Larry Woodman 2005-05-03 14:19:49 UTC

Please try a later RHEL3 kernel.  The RHEL3-U3 kernel had a bug that prevented
kswapd from reclaiming inodes therefore lowmem became totally consumed and many
of the system daemons chewed up lots of CPU time trying to recover forom this. 
The fix for this bug was included in RHEL3-U4(kernel-2.4.21-27.EL), please try
the latest RHEL3-U5 kernel and let me know it it fixes the problem.

Larry Woodman

Comment 23 Keith Brautigam 2005-05-04 15:06:31 UTC

I am having the same problem on a server running RHEL AS 3 that has a 1.5TB ext3
filesystem and 4 1.5TB reiserfs filesystems, 2 GB of RAM and 2 xenon CPUs that
are hyperthreaded.  When doing any intensive disk IO on the ext3 filesystem (cp,
rsync, etc.) kswapd begins to use about 25% of the CPU.  Further, about 50% of
the CPU power is consumed under the system portion of top. During this state no
swap is used and the system has plenty of free RAM.  Eventually, (overnight)
100% of the system resources are used.  At that point text is logged to the
console that an out of memory state exists and that pids are killed.  The only
way to recover is to kill the power.

This does not happen when disk IO is done on the reiserfs filesystems.  Further,
this condition did not exist in the past.  Because the data is not used as much
on the ext3 filesystem I'm not sure when the problem cropped up.  It seems
within the last few weeks.  The server is up2date on everything as of today
(2005-05-04).  I am currently running kernel 2.4.21-27.0.4.ELsmp  However, I
have also tried the non-smp version of the same kernel and the original AS 3
kernel (2.4.21-4.ELsmp).  The same problem results with both kernels.

Please let me know if there is any more information needed from me.  This is a
major problem for us as our data cannot be removed from the ext3 filesystem and
it cannot really be accessed either.

Keith Brautigam

p.s. I am willing to try new kernels etc. if needed.  Thanks!

Comment 24 Keith Brautigam 2005-05-11 15:47:07 UTC

I have compiled and installed the stock 2.4.3 kernel, which has allowed me to
read and write to and from my ext3 partition without kswapd activating and
consuming all my cpu resources in the system categories.

I used the .config file from the kernel-2.4.21-i686-smp that is included with
the kernel-source package when compiling the new kernel.  I said 'no' to all of
the new modules.

In the past when the machine became unusable a message was printed to the
console warning that PIDs were being killed because lowmem = 0.  With the new
kernel I have not been able to reproduce this problem.

When I have a chance I will reboot with the most current redhat kernel and see
if the problem occurs with the same partition (now formatted as reiserfs) just
to make sure it was not the module from my RAID card that made the difference vs
something related to having an ext3 filesystem.

Keith

Comment 26 RHEL Program Management 2007-10-19 19:04:38 UTC

This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.