From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.4.3) Gecko/20040803 Description of problem: After the latest kernel upgrade in U3 we have seen the kswapd kernel run amok several times on particular machines. Kswapd runs and eats CPU time, but does not actually swap much. Right now, after c:a 6 days uptime, it as accumulated almost 8 hours CPU time. This happens occasionally, often on an unloaded machine. We haven't been able to figure out any particular pattern (yet). Putting more load on the machine could make kswapd calm down again. We never saw this happen with the kernel from U2. filadelfia> uname -a Linux filadelfia 2.4.21-20.ELsmp #1 SMP Wed Aug 18 20:46:40 EDT 2004 i686 i686 i386 GNU/Linux filadelfia> uptime 13:55:11 up 6 days, 3:05, 0 users, load average: 0.00, 0.00, 0.00 filadelfia> ps -C kswapd -fl F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 1 S root 11 1 5 75 0 - 0 - Sep09 ? 07:55:51 [kswapd] filadelfia> vmstat procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy wa id 0 0 4644 4612624 387184 848828 0 0 15 7 12 9 0 2 1 16 filadelfia> grep swap /proc/stat swap 76596 212437 Version-Release number of selected component (if applicable): kernel-2.4.21-20.EL How reproducible: Sometimes Steps to Reproduce: 1. Spontaneous. Additional info: filadelfia> lsmod Module Size Used by Not tainted nls_iso8859-1 3516 0 (autoclean) nfsd 85168 8 (autoclean) lp 9124 0 (autoclean) parport 38816 0 (autoclean) [lp] autofs4 16952 3 (autoclean) nfs 99376 11 (autoclean) lockd 58992 1 (autoclean) [nfsd nfs] sunrpc 89212 1 (autoclean) [nfsd nfs lockd] e1000 76956 1 floppy 57488 0 (autoclean) sg 37228 0 (autoclean) sr_mod 17784 0 (autoclean) microcode 6848 0 (autoclean) ide-scsi 12464 0 ide-cd 34016 0 cdrom 32544 0 [sr_mod ide-cd] loop 12696 0 (autoclean) ext3 89960 6 jbd 55060 6 [ext3] lvm-mod 64992 5 megaraid2 37928 11 sd_mod 13712 22 scsi_mod 114344 5 [sg sr_mod ide-scsi megaraid2 sd_mod]
Göran, this is the first I've heard of this problem. If you do see this happen please try to collect some data so I can figure out whats happening. If possible please collect 1.) "vmstat 1" output. 2.) "top 1" output. 3.) "AltSysrq M" and "AltSysrq T" and attach the /var/log/messages file. Thanks, Larry Woodman
We'll try to catch this info. (We don't have a console on this machine currently, so the SysRq info could be slightly harder. I'll see what we can do.)
Just "echo m > /proc/sysrq-trigger" and "echo t > /proc/sysrq-trigger" Larry
Ah, I didn't know about /proc/sysrq-trigger. You learn something each day! :-) Thanks!
The problem was triggered again this morning (Central European time). We believe it started when we ran an "tar zxf". I've made two measurements, one while the tar was still running and one after we had killed it. I'll attach both. (They are just raw outputs from the script I made, except I've trimmed the extra lines from messages my sed command wasn't smart enough to skip. I wasn't sure how to format things, but I'm sure you'll manage! :-)
Created attachment 103900 [details] Probe while tar was still running
Created attachment 103901 [details] Probe after tar was killed
Created attachment 103902 [details] Complete messages file There are several abort messages from the megaraid module in there. Could this be the problem maybe?
Göran, I still cant see exactly what kswapd is doing, the memory stats really dont show the problem. Can you grab me a few "AltSysrq P" and "AltSysrq W" outputs when the kswapd is running amok. This will give us the backtrace of the running(kswapd) process. Thanks, Larry Woodman
Created attachment 103912 [details] Output from a number of SysRq p and w The "p" doesn't seem to write anything to messages, except for the header "SysRq : Show Regs". Is there something additional, apart from sysctl kernel.sysrq I need to enable?
When I do NOT want to change state from NEEDINFO to ASSIGNED, bugzilla seems to do it automatically for me. But this time when it would have been appropriate, then I have to do it myself! Oh well. :-)
We just discovered there has been a thread on taroon-list around this problem. http://www.redhat.com/archives/taroon-list/2004-September/msg00073.html Bug 121434 is mentioned in there. It has a very long thread, but it doesn't seem to be exactly the same symptoms. Also, that was filed back in April, and we started seeing this in update 3. Could of course still be related in some way.
I've seen this as well, on a couple of AS3 U3 boxes running Cyrus-IMAPd. When I'm migrating users to them by copying their mail via IMAP, kswapd seems to be spinning on one CPU, and although there are more of them, interactivity is dog slow. You'd think the link you were SSH'ing over were a mobile GSM-data link or something. For instance, running "df" takes over seven seconds. When I'm not migrating users to them, kswapd calms down. On the other hand, there's no real load on the servers then either. This seems to be happening even though virtually no swap is used. Indeed, it persist even if I disable swap entirely with "swapoff -a"! I'll attach some system graphs you might find interesting. If there's any more information I can provide, please let me know. I'd be interested to hear of any workarounds too, even if these mean that I cannot use swap. As it stands, I have enough physical memory anyway. The machines in question are HP ProLiant DL360 G3's. HyperThreading is enabled. The kernel I'm running is 2.4.21-20.ELsmp, sadly enough tainted with HP's SecurePath binary modules (...please add support for dm-multipath soon, so I can get rid of them!). I cannot confirm the submitter's assertion that the problem was introduced in U3, as this system was just recently put into production. Tore Anderson
Created attachment 103920 [details] Memory usage for one of the affected machines, by day
Created attachment 103921 [details] CPU usage for one of the affected machines, by day
This problem seemed to reach critical mass some hours after I wrote my initial comment, pulling one machine down and making the other in a seriously dysfunctional state - although user space was alive, lots of processes was impossible to kill, every command (even the most simple ones such as "cat file") took ages to finish, and the load average had skyrocketed. Cluster Manager didn't cope with the situation either, although that's perhaps not to be expected as both members started getting real problems almost simultaneously. I booted both boxes, reverting to kernel 2.4.21-15.0.4.ELsmp, and up until now I've not seen any sign of this bug being present in that kernel. So it seems Mr. Uddeborg is correct: This bug is a new, er, feature of U3. The fact that I had to revert to an older kernel also means, I'm afraid, that I cannot easily provide more debug information upon request. I hope that won't be detrimental to pinpointing and correcting the bug. Tore Anderson
We see this too on our IMAP server. The problem only occurs during heavy IO, in our case when running tar for backup and when squatter (Cyrus indexer) runs. The only thing in common is that a lot of files are opened and read by these programs. The Cyrus spool partition is ext3 and mounted with the noatime option. The hardware is a Dell 2650, 2x3GHz CPU, 4GB RAM and a PERC 4/DC RAID controller. We use the megaraid2 driver. Currently we run on the previous kernel that has no such problem.
kswapd is spinning in prune_icache, I'll look into this. Larry
Göran, basically kswapd is spending lots of time in prune_icache(). First of all, how can I reproduce this problem? I have tried and can never catch the system in a state where kswap is eating up any significant cpu time. Second, when you see the system in this state, can you "cat /proc/sys/fs/inode-nr" as well as getting me AltSysrq M, T, P and W outouts? Larry Woodman
Just wanted to chime in that we are having a very similar problem. I can replicate it on a number of RHEL3 U3 boxes by doing pipeing a tar through ssh to another box. Not sure if it matters, but this works particularly well when really taxing the filesystem (i.e. my tar is of 100,000's of small 0-10B files.)
As you may have guessed, we do not quite know what is necessary to reproduce the problem. We have many machines running RHEL3U3, but see this problem on only one. (At least we see it very often on just one.) The hardware and software configuration of this machine is a bit different from most of our machines. It is a Dell PE2600 with three external PV220, and the filesystem we use most on it is an LVM RAID partition. It is hard for me to guess what difference is the relevant one. I'll attach a number of outputs from various informational commands, in addition to the SysRq you asked for. Our standard way to trigger the behaviour we start extraction of a large (compressed) tar archive. It seems to always go into this mode then. If we break the tar process kswapd still remains running 100%. If we remove the extracted files, kswapd goes back to normal. As a lost note for now: One more way we have found to affect the behaviour is changing the value of /proc/sys/vm/inactive_clean_percent. We normally have it on the default 30. If we set it to 100 we don't see the effect any more (but the machine runs slower).
Created attachment 104173 [details] Result in /var/log/messages from SysRq m, t, p, and w SysRq p still doesn't write anything for somre reason. The contents of /proc/sys/fs/inode-nr was "124527 123958".
Created attachment 104174 [details] Output of "head /proc/megaraid/hba[1-3]/raiddrives-0-9"
Created attachment 104175 [details] Output of "pvdisplay /dev/sd[b-d]1"
Created attachment 104176 [details] Output of "vgdisplay /dev/vg00"
Created attachment 104177 [details] Output of "lvdisplay /dev/vg00/lvol2"
Created attachment 104178 [details] Output of "lspci -vv"
Created attachment 104181 [details] Contents of /proc/cpuinfo
Created attachment 104182 [details] Contents of /proc/meminfo
For what it's worth, my hardware is very different to Göran's. I have a few HP ProLiant DL360 G3's, who use the cpqarray kernel module for their internal storage. The mail spool is stored on a HP HSV110 EVA, driven by a QLogic QLA2342 HBA (qla2300 module). HP SecurePath is used. I use LVM on the internal RAID, but plain ext3 directly on the SCSI hard drive devices for the mail spool. What is common seems to be that the system is doing lots and lots of file operations on a large number of tiny files. I was copying mail via IMAP from one server when I got problems - an operation that creates a massive amount of small files; one per email. Tore Anderson
Additional info: our systems that are showing this are Dell PE8450's, HP DL380's and BL20p's. They are all attached to either a PV640 or HSV100's via the qla2300 drivers... which i guess is another similarity. If needed i can see if the problem also happens accessing local disk. Is there any specific information that i can provide that would help? i have played with inactive_clean_percent which temporarily solves the problem but it comes back in a couple of hours. have also changed pagecache to no avail. Are all of the systems affected by this using the qla2300 module by any chance?
Based on the AltSysrq-M output attached I think the system might be in a state that the free and inactive clean pages lists are too low. Can you guys try setting the /proc/sys/vm/inactive_clean_percent all the way to 100 just to see if the problem dissappears? I suspect that a combination of the number of inodes that are currently unused(closed files), the average number of pages on each inode is about the same as that of the unused inodes(several small files were in use) and the pageing thresholds is causing kswapd to make little or no progress when the system gets into this state. Larry Woodman
Created attachment 104195 [details] Output of "lsmod" No, we do not have qla2300 loaded.
Created attachment 104198 [details] Result in /var/log/messages from SysRq m, t, p, and w
setting inactive_clean_percent to 100 seems to make kswapd go away for a while, but it comes back in a couple of hours if the IO load doesnt stop in the meantime.
OK, can you set it to 100, restart the test and after it "comes back" get me "vnstat 1" "top" and "AltSysrq M, T and W outputs" so I can see what stat the system is in at that point? Sorry, I cant reproduce this problem here for some reason and getting access to your machines wouldnt make solving this any quicker. Thanks for you help on this guys. Larry Woodman
Created attachment 104257 [details] Test program to trigger kswapd It was quite hard to trigger kswapd with inactive_clean_percent set to 100, but finally we could by using this attached program. It creates a lot of small C++ files. Trying it elsewhere, we found we could trigger it on at least one more machine. So maybe this is a test case you could use to trigger the problem yourself? The procedure we have used which has shown most reliable have been to first use all caches with a command like this dd if=/dev/zero of=/some/(local)/disk/file bs=1G count=6 and then run this test program. Kswapd comes up after some time, sooner with 30 and later with 100. I'll attach the probes from this test too.
Created attachment 104258 [details] Output of "vmstat 1 20", inactive_clean_percent=100
Created attachment 104259 [details] Output of "top b n 3", inactive_clean_percent=100
Created attachment 104260 [details] Result in /var/log/messages from SysRq m, t, and w, inactive_clean_percent=100
I created a test kernel that prevents kswapd from spending so much time pruning the inode cache. Can you guys give this a test ASAP and let me know how it works for you? The kernel is here: >>>http://people.redhat.com/~lwoodman/.RHEL3/ The patch I applied is this: ***************************************************************************** --- linux-2.4.21/mm/vmscan.c.orig 2004-09-24 11:19:15.000000000 -0400 +++ linux-2.4.21/mm/vmscan.c 2004-09-24 11:27:52.000000000 -0400 @@ -1028,7 +1028,7 @@ static int do_try_to_free_pages(unsigned } ret += shrink_dcache_memory(DEF_PRIORITY, gfp_mask); - ret += shrink_icache_memory(1, gfp_mask); + ret += shrink_icache_memory(10, gfp_mask); #ifdef CONFIG_QUOTA ret += shrink_dqcache_memory(DEF_PRIORITY, gfp_mask); #endif @@ -1075,6 +1075,7 @@ static int do_try_to_free_pages_kswapd(u int ret = 0; struct zone_struct * zone; pg_data_t *pgdat; + static unsigned long last_inodecache_shrink; /* * Eat memory from filesystem page cache, buffer cache, @@ -1096,7 +1097,11 @@ static int do_try_to_free_pages_kswapd(u if (free_low(zone + ZONE_DMA) > 0 || free_low(zone + ZONE_NORMAL) > 0) { ret += shrink_dcache_memory(DEF_PRIORITY, gfp_mask); - ret += shrink_icache_memory(DEF_PRIORITY, gfp_mask); + /* shrink the icache by no more than 1% each second from kswapd */ + if (time_after(jiffies, last_inodecache_shrink + HZ)) { + ret += shrink_icache_memory(100, gfp_mask); + last_inodecache_shrink = jiffies; + } try_to_reclaim_buffers(DEF_PRIORITY, gfp_mask); #ifdef CONFIG_QUOTA ret += shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
This patch seems to have worked for me at least - the test kernel behaves as expected during backup and a run of squatter - no problems with kswapd.
Created attachment 104388 [details] /var/log/messages during restart and running test program It did not work very well for us. Starting our test program runs for a little while, then kswapd starts going up, an we are thrown out. Looking at the console or in messages reveals a lot of processes killed with "Out of Memory" motivation. I enclose an excerpt from messages, starting with the boot sequence, and then when the test program is being started shortly after. This is repeatable. Our original test case, file extraction with tar, seem to work with this (20.11.inodeshrink) kernel, though.
Göran, you are having a different type of problem now. Your attachment indicates that lowmem is *totally" consumed in the slabcache!!! ***************************************************** kernel: aa:0 ac:25 id:112 il:102 ic:0 fr:638 191132 pages of slabcache ***************************************************** Please get me a "cat /proc/slabinfo" so we can determine who is allocating all of those kernel data structures and not ever freeing them. None of the lsmod entries are indicative of this type of problem. Larry
> Göran, you are having a different type of problem now. If you say so. :-) It is triggered by the same program, this program runs fine with 2.4.21-15.0.3 but not later, and kswapd goes up high in top in both cases. So it looked related to me. But I'm definitely not going to argue with you if it is the same bug or not. :-) I'll attach slabinfo samples.
Created attachment 104571 [details] Last retrieved copy of slabinfo. As I can't copy slabinfo after my login sessions get killed. So I made a loop copying /proc/slabinfo each second. This is the last sample I got. Presumably, the looping shell was killed after that.
Created attachment 104572 [details] One of the first samples, for reference. In case it could be of any help, I have one sample each second between these. That is, 198 of them.
Most of your memory is consumed by inodes. Larry, it would appear that inode reclaiming might need to be a bit more aggressive, there don't seem to be anywhere near as many dentries or open files as there are inodes being cached.
Göran, the slabcache output you attached indicates that lowmem is totally consumed in inodes and buffer_headers. When this happens kswapd should call prune_icache(which it does) and that should wake up bdflush to push out the dirty pages so dirty pages so the inodes and buffer_headers can be reclaimed. Obviously something isnt working as it should, can you reproduce this problem and get a few AltSysrq-T outputs so I can see what bdflush is doing? Thanks, Larry
Created attachment 104620 [details] Result in /var/log/messages from SysRq t repeatedly I reran the triggering program, and in parallel did a SysRq-T every 15:th second. The behaviour was slightly different at first. The machine still started to throw out processes, but it didn't go down completely. The triggering program was apparently killed after a while, and the machine recovered after that. I tried once more, and then it crashed like before. The attachment is from the second run. I don't know if it actually is the SysRq:s which affect the behaviour, or if it is a coincidence.
By the way, there is a jump in the comments from comment 49 to comment 51. Is that intentional? What happened to comment 50?
Seeing this with RHEL3 U3 SMP kernel. The trigger is when a user tries to make a copy of his data. Only 5 gig, but it's spread across 600,000 files. ;) My /proc/slabinfo confirms it's an inode_cache issue: inode_cache 207606 354284 512 50612 50612 1 : 496 124 kswapd is taking anywhere from 70% to 399.9% of our cpu (it's a dual xeon with hyperthreading), which pushes the load up to about 12 (normally we sit around 0.3). Meanwhile vmstat shows *no* swapping actually taking place. This is on a system with 4 gig ram and 4 gig swap. Let me know if there's additional info I can provide (but keep in mind it's a production system, so I can only do "safe" operations).
We see exactly the same phenomenon since updating to EL3. kswapd eats up CPU without doing actual swapping and really slowing down the system. RH EL AS version 3, kernel 2.4.21-20.ELsmp IBM x345 dual Xeon (HT on) server with 4GB RAM and 4GB swap space. Unfortunately this is our central file server which has 3.4 TB FibreChannel Storage attached to it (IBM FAStT600 + FAStT700 Expansion). Therefore user really suffer from an unresponsive file server up to complete unresponsiveness.
Is everyone that is experiencing this problem dealing with a huge number of files(I see hundreds of thousands) or does it also show up when there are smaller number of files involved? Also, the origional discription of this bug stated it was an x86_64 system. Is this happening on all architectures? Larry
I'm on a dual xeon with hyperthreading (so linux sees 4 processors). So x86, but NOT 64-bit. Yes to the huge number of files. I think it might be an issue when writing files only, not reading them, since our nightly backups don't seem to trigger it. But I could be wrong about that. Anyone seeing this on a NON-smp kernel? Anyone have less than 3 gig of ram?? BTW, increasing inactive_clean_percent from the default of 30 to a value of 50 seems to help. If nobody has the problem on an AMD system, I can attempt to reproduce it on our dual Opteron system (3.5 gig ram).
I have this problem on a dual Xeon server, 4GB RAM. kswapd sucks up cpu when using tar (millions of files) and when squatter (cyrus indexer) runs.
For the original U3 kernel (kernel-smp-2.4.21-20.EL): We have only seen this with a significant number of files, if not hundreds of thousands. It first happened when we tried to rsync large source code CVS repository with some 70 000 files. Our standard tar extraction test case includes around 160 000 files. Tar has only started extracting those files when kswapd stops other activity, though. With your test kernel from comment 42 (kernel-smp-2.4.21-20.11.inodeshrink.EL): This one handles the tar file. The test case we use here would create around 2 million files, if it succeeded. I don't believe my original description mentioned x86_64. The machie where this happens most often has dual Pentium 4 Xeon (Northwood) Original OEM (hyperthreaded to four virtual). We do have some AMD:s, but we haven't seen this triggered there. We haven't specifically tried.
We've seen this on our backup server, primarily during the large backup run at night, so I do think it can happen primarily during reads. This system is a 4-CPU Dell 6450 (PIII Xeon/700Mhz). We have been running a kernel that we pulled from comment 42 as well, although it's version with 2.4.21-20.12. I don't know if it's just luck or not, but so far it has made 7 days of uptime (the original 2.4.21-20 crashed anywhere from 24-72 hours everytime). I didn't seen anything about this applying only to x86_64, although it was originally posted from a machine that had x86_64 in it's User Agent screen, the actualy description indicates an i686 and it was files against i386. Later, Tom
We also observe it on a dual Xeon with hyperthreading, 4GB of RAM and 4 GB of swap space (2 x 2GB partitions). As previously mentioned, this machine acts as NFS file server (<a href="chttps://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=132639#c55">omment 53<\a>). We saw this behaviour when we had local processes like backup or tar/gunzip (which we sometimes cannot avoid) and additional load caused through NFS activities from the NFS clients.
So it appears that this problem only occurs when a a huge number of files is involved and there are writes to those files. I can see what is going on in the system when this happens: kswapd calls prune_icache which calls invalidate_inode_pages for each file and it cant do anything because the pages caching the file data are dirty. But, I cant see what changed between RHEL3-U2 and RHEL3-U3 that caused this change in behavior. RHEL3-U2 was OK, correct? Larry
Yes, RHEL3-U2 was OK.
We see this problem when running backups and dumping the result across the network to another machine - tar only reads, no local writes. The imap partition is mounted with the noatime option. The last U2 kernel is still ok.
We are seeing this too. Red Hat service request #368982, if anyone at Red Hat is interested. I can dump the info here if anyone is interested, too (as it seems the Red Hat support is babbling down at "Dear Sir. Which kernel are you running. Thank You" level and not making any progress.) Summary: kswapd is spinning at the same place, few updatedb and rm processes are unkillable. Test script is to continuously extract and rm kernel source trees while running updatedb in parallel.
Created attachment 105430 [details] Details of the problem we are seeing Here are the details I sent to the Rde Hat support in case any one is interested.
Basicly, it seems this happens every time I try to access very many files (updatedb, cp, rm whatever); the dcache fills up and kswapd goes into busy loop. This seems pretty bad to me, a mere "cp -ax / /mnt/otherdisk" was enough to reproduce it (granted there were a lot of files on / since I forgot to exclude /tmp, where I had ran the kernel extract torture test, but "a lot of files" hardly qualifies as a "don't do that" for an "enterprise" OS does it?) This is the script I used for testing: while true; do for i in $(seq 1 500); do mkdir $i cd $i && tar xjf /tmp/linux-2.4.27.tar.bz2 cd .. done date >> filefill.log df -m . >> filefill.log rm -rf ? ?? ??? done and on another terminal, I ran "while true; do updatedb; done". Here is the kswapd stack trace for completeness: kswapd R 00000000 4332 11 1 12 13 (L-TLB) Call Trace: [<c011d500>] smp_apic_timer_interrupt [kernel] 0x0 (0xc496df10) [<c011d570>] smp_apic_timer_interrupt [kernel] 0x70 (0xc496df14) [<c029f967>] __br_write_lock [kernel] 0x27 (0xc496df5c) [<c029f9aa>] .text.lock.brlock [kernel] 0x5 (0xc496df64) [<c014724c>] invalidate_inode_pages [kernel] 0x6c (0xc496df68) [<c017f3ea>] prune_icache [kernel] 0x1ba (0xc496df7c) [<c017f484>] shrink_icache_memory [kernel] 0x24 (0xc496dfa0) [<c015636e>] do_try_to_free_pages_kswapd [kernel] 0x14e (0xc496dfac) [<c0156508>] kswapd [kernel] 0x68 (0xc496dfd0) [<c01564a0>] kswapd [kernel] 0x0 (0xc496dfe4) [<c01095ad>] kernel_thread_helper [kernel] 0x5 (0xc496dff0) I also get some unkillable (kill -KILL) processes occasionally, here are a few: [3]root@elbrus:/tmp/burn>kill 12690 [3]root@elbrus:/tmp/burn>kill 12690 [3]root@elbrus:/tmp/burn>kill 12690 [3]root@elbrus:/tmp/burn>kill 12690 [3]root@elbrus:/tmp/burn>kill 12690 [3]root@elbrus:/tmp/burn>kill -KILL 12690 [3]root@elbrus:/tmp/burn>kill -KILL 12690 [3]root@elbrus:/tmp/burn>kill -KILL 12690 [3]root@elbrus:/tmp/burn>kill -KILL 12690 [3]root@elbrus:/tmp/burn>kill -KILL 12690 [3]root@elbrus:/tmp/burn>lsof -p 12690 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME rm 12690 root cwd DIR 8,2 4096 14683708 /tmp/burn/yummi/1/linux-2.4.27/include/asm-sh rm 12690 root rtd DIR 8,2 4096 2 / rm 12690 root txt REG 8,2 26652 835635 /bin/rm rm 12690 root mem REG 8,2 1568924 14172162 /lib/tls/libc-2.3.2.so rm 12690 root mem REG 8,2 106912 524292 /lib/ld-2.3.2.so rm 12690 root 0u CHR 136,1 3 /dev/pts/1 rm 12690 root 1u CHR 136,1 3 /dev/pts/1 rm 12690 root 2u CHR 136,1 3 /dev/pts/1 rm 12690 root 3r DIR 8,2 8192 21 /tmp/burn/yummi rm 12690 root 4r DIR 8,2 4096 14683708 /tmp/burn/yummi/1/linux-2.4.27/include/asm-sh [3]root@elbrus:/tmp/burn>top -b -n1 | egrep "(kswapd|updatedb|rm)" 11 root 25 0 0 0 0 SW 100.5 0.0 485:35 0 kswapd 8638 root 15 0 712 712 476 D 2.4 0.0 0:52 2 updatedb 12690 root 15 0 404 404 336 D 0.0 0.0 0:00 0 rm And on another boot: updatedb D 00000000 3776 10304 8875 (NOTLB) Call Trace: [<c0123e24>] schedule [kernel] 0x2f4 (0xca567e14) [<c017e50c>] __wait_on_inode [kernel] 0x7c (0xca567e58) [<c017f8b8>] iget4_locked [kernel] 0x98 (0xca567eac) [<f887076d>] ext3_lookup [ext3] 0x7d (0xca567ed4) [<c0171a3c>] real_lookup [kernel] 0xec (0xca567ef8) [<c017206c>] link_path_walk [kernel] 0x45c (0xca567f18) [<c01725b9>] path_lookup [kernel] 0x39 (0xca567f58) [<c0172909>] __user_walk [kernel] 0x49 (0xca567f68) [<c016d9ae>] sys_lstat64 [kernel] 0x2e (0xca567f84) [<c0161d6b>] sys_fchdir [kernel] 0x4b (0xca567fa0)
I think I found the problem here, this patch prevents prune_icache from making progress: ******************************************************************** --- linux-2.4.21/fs/inode.c.orig +++ linux-2.4.21/fs/inode.c @@ -285,7 +285,8 @@ void refile_inode(struct inode *inode) BUG(); */ if (!inode) return; spin_lock(&inode_lock); - __refile_inode(inode); + if (!(inode->i_state & I_LOCK)) + __refile_inode(inode); spin_unlock(&inode_lock); } ***************************************************************** I removed it and built a new kernel, can someone try it out ASAP? Its located here: >>>http://people.redhat.com/~lwoodman/.RHEL3/ Larry Woodman
Nice find Larry. I suspect that should do the trick ...
Larry, Does this patch work for i386 as well - saw the file name was i686. We have the same issue with Update 3.
Larry, when I download http://people.redhat.com/~lwoodman/.RHEL3/kernel-smp-2.4.21-22.kswapdfix.EL.i686.rpm I just get 9240086 null bytes. Something is wrong, obviously. I've tried from two different sites with the same result, so I believe it is on your side.
Yeah, it seems full of NULL for me too.
OK guys, my disk quota was exceeded. Grab the kernel again from: >>>http://people.redhat.com/~lwoodman/.RHEL3/ Larry
Ok, I'm running it now. Will take some time to tell for sure if kswapd hangs or not...
We have tried our test cases now, and the machine is still running fine. We managed to get kswapd up to around 20% of the CPU as a maximum. But it didn't go out of control, and it came back down again. From our point of view, this is solved now. (I would have resolved the bugzilla if it wasn't for this bug's large fan club, eh, cc list. Maybe more people want to try the latest version first. So I leave the resolving to you Larry. :-)
OK, thanks. Can I get everyone to try out a more complete version? Its located in: >>>>http://people.redhat.com/~lwoodman/.RHEL3/kernel-smp-2.4.21-22.prune_icachefix.EL.i686.rpm Thanks, Larry
My test case (extract 500 kernel source trees and run updatedb in parallel, see comment #67) still fails. (2.4.21-22.kswapdfix.ELsmp) kswapd seems to behave, but I get hung, unkillable updatedb process: >pidof updatedb 31389 >kill -KILL 31389 >kill -KILL 31389 >kill -KILL 31389 >kill -KILL 31389 >kill -KILL 31389 >pidof updatedb 31389 >lsof -p 31389 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME updatedb 31389 root cwd DIR 58,1 4096 7979042 /data/burn/3/ linux-2.4.27/arch/ppc64/kernel updatedb 31389 root rtd DIR 58,0 4096 2 / updatedb 31389 root txt REG 58,0 27520 392649 /usr/bin/ slocate updatedb 31389 root mem REG 58,0 1568924 211818 /lib/tls/libc- 2.3.2.so updatedb 31389 root mem REG 58,0 51920 570174 /lib/ libnss_files-2.3.2.so updatedb 31389 root mem REG 58,0 106912 570141 /lib/ld-2.3.2. so updatedb 31389 root 0u CHR 136,3 5 /dev/pts/3 updatedb 31389 root 1u CHR 136,3 5 /dev/pts/3 updatedb 31389 root 2u CHR 136,3 5 /dev/pts/3 updatedb 31389 root 3w REG 58,0 499712 749779 /var/lib/ slocate/slocate.db.tmp updatedb 31389 root 4r DIR 58,1 8192 32769 /data/burn updatedb 31389 root 5r DIR 58,1 4096 7979042 /data/burn/3/ linux-2.4.27/arch/ppc64/kernel sysrq-t: updatedb D 00000000 3324 31389 2883 (NOTLB) Call Trace: [<c0123f14>] schedule [kernel] 0x2f4 (0xe00f5e14) [<c017f28c>] __wait_on_inode [kernel] 0x7c (0xe00f5e58) [<c0180628>] iget4_locked [kernel] 0x98 (0xe00f5eac) [<f889876d>] ext3_lookup [ext3] 0x7d (0xe00f5ed4) [<c01727bc>] real_lookup [kernel] 0xec (0xe00f5ef8) [<c0172dec>] link_path_walk [kernel] 0x45c (0xe00f5f18) [<c0173339>] path_lookup [kernel] 0x39 (0xe00f5f58) [<c0173689>] __user_walk [kernel] 0x49 (0xe00f5f68) [<c016e51e>] sys_lstat64 [kernel] 0x2e (0xe00f5f84) [<c016284b>] sys_fchdir [kernel] 0x4b (0xe00f5fa0)
OK now please try the prune_icachefix kernel. Larry
I've booted it now. It'll take at least a few hours to tell if it helps. Out of interest, what's the difference between 2.4.21-22.prune_icachefix.ELsmp and /vmlinuz-2.4.21-22.kswapdfix.ELsmp? Does it affect mainline?
I've tried 2.4.21-22.prune_icachefix.EL and it behaves well in our tests.
I've been running the test case on one machine for eight hours and for couple of hours on another machine, both running 2.4.21-22. prune_icachefix.ELsmp. No problems yet. Before, the problems have always happened in few hours. I'll let the scripts run for some time still, but looks like you nailed it, Larry! Good job!
We've tried this patch also in our pre-operational environment. 850.000 files in less than 10 minutes. Before the patched kernel this took more than 30 minutes and resulted in an unusable system! We'll do some more testing but I'm confident this has solved the problem.
Ummh, I'm seeing a problem after all. test script: __________________________________________________________________ for i in $(seq 1 500); do mkdir $i cd $i tar xf /tmp/linux-2.4.27.tar cd .. done date >> filefill.log df -m . >> filefill.log rm -rf [1234567890] [1234567890][1234567890] [1234567890][1234567890][1234567890] done & while true; do updatedb; done __________________________________________________________________ The test script (see above) has been running for some time on two machines, and on another one, I'm seeing some weird stuff: mkdir: cannot create directory `286': File exists tar: linux-2.4.27/include/linux/sunrpc/xprt.h: Cannot open: File exists tar: linux-2.4.27/include/linux/sunrpc/msg_prot.h: Cannot open: File exists tar: linux-2.4.27/Documentation/sound/VIBRA16: Cannot open: File exists tar: linux-2.4.27/Documentation/sound/rme96xx: Cannot open: File exists tar: linux-2.4.27/Documentation/sound/OPL3: Cannot open: File exists tar: linux-2.4.27/Documentation/sound/AudioExcelDSP16: Cannot open: File exists tar: linux-2.4.27/Documentation/sound/INSTALL.awe: Cannot open: File exists tar: linux-2.4.27/Documentation/sound/PSS-updates: Cannot open: File exists tar: Error exit delayed from previous errors mkdir: cannot create directory `287': File exists tar: linux-2.4.27/arch/i386/math-emu/fpu_system.h: Cannot open: File exists tar: linux-2.4.27/arch/i386/math-emu/poly_atan.c: Cannot open: File exists tar: linux-2.4.27/arch/i386/math-emu/poly_l2.c: Cannot open: File exists tar: linux-2.4.27/arch/i386/math-emu/reg_u_add.S: Cannot open: File exists tar: linux-2.4.27/arch/i386/math-emu/reg_u_div.S: Cannot open: File exists tar: linux-2.4.27/arch/i386/math-emu/div_Xsig.S: Cannot open: File exists tar: Error exit delayed from previous errors mkdir: cannot create directory `288': File exists tar: linux-2.4.27/include/asm-sparc64/unistd.h: Cannot open: File exists tar: linux-2.4.27/include/asm-sparc64/perfctr.h: Cannot open: File exists [3]root@elbrus:/mnt/3ware1>ls -ld 288/linux-2.4.27/include/asm-sparc64/unistd.h drwxrwxr-x 5 573 573 4096 Oct 22 04:40 288/linux-2.4.27/include/asm-sparc64/unistd.h [3]root@elbrus:/mnt/3ware1>ls 288/linux-2.4.27/include/asm-sparc64/unistd.h | wc 117 117 1552 Some files have turned into directories that contain files. The script went through a couple of times without a hitch, but now it begun giving those errors. On another box (where the script has been running longer), there are no such problems. The test partition is ext2 since I mistakenly did mkfs, not mkfs.ext3 (other machine uses ext3): [3]root@elbrus:/mnt/3ware1>df -hT . Filesystem Type Size Used Avail Use% Mounted on /dev/sda ext2 230G 96G 122G 45% /mnt/3ware1 dmesg shows EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 25480758 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 25480759 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 25480760 EXT2-fs error (device sd(8,0)): ext2_check_page: bad entry in directory #674668: directory entry across blocks - offset=0, inode=807414319, rec_len=13176, name_len=67 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944685 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944686 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944687 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944688 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944689 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944690 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944691 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944692 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944693 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944694 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944695 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944696 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944697 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944698 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944699 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944700 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944701 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944702 EXT2-fs error (device sd(8,0)): ext2_free_inode: bit already cleared for inode 16944703 EXT2-fs error (device sd(8,0)): ext2_free_blocks: bit already cleared for block 33884878 EXT2-fs error (device sd(8,0)): ext2_free_blocks: bit already cleared for block 33884879 EXT2-fs error (device sd(8,0)): ext2_free_blocks: bit already cleared for block 33884880 EXT2-fs error (device sd(8,0)): ext2_free_blocks: bit already cleared for block 33884881 EXT2-fs error (device sd(8,0)): ext2_free_blocks: bit already cleared for block 33884882 EXT2-fs error (device sd(8,0)): ext2_check_page: bad entry in directory #10831087: unaligned directory entry - offset=0, inode=168459530, rec_len=30218, name_len=111 EXT2-fs error (device sd(8,0)): ext2_free_blocks: bit already cleared for block 21664379 EXT2-fs error (device sd(8,0)): ext2_free_blocks: bit already cleared for block 21664382 I think this is the fifth bug (four of which are kernel bugs) I've found in RHEL3U3 in my very brief burn in tests. I think I'll scratch the ext2 partition, and go with ext3 as it _seems_ stablish. Anyone is free to try to reproduce the problem with a >100GB ext2 partition, and the above test script. Unfortunately I don't think I have time right now.
To be fair, there is a change that the ext2 partition got corrupted because of an unclean shutdown, but I'm not sure. But I really have no time to reproduce this, I recreate the scratch partition as ext3, rerun the test, and if that's ok, I'll forget about this.
I had the same problem using RHEL 3.0 ES U3, on a qmail server with a 1GB raided RAMdisk. The last kernel rpm kernel-smp-2.4.21-22.prune_icachefix.EL.i686.rpm seems to have fixed it. I used a modified version of the test script above (changing to a different tar.gz file) and could not get it to freak out like it did before. Before, when qmail started to spam bomb its emails as it normally does, kswapd would suddenly go nuts and hit 100%+ CPU, near max real memory, but not touch swap. Now, it doesnt misbehave, though I still dont see it using any swap at all. Stats: HP Proliant DL380 G3 2GB HP RAM Dual Xeon 3.0Ghz 72GB+ total HD
Hmm, my bug returned: procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy wa id 6 0 0 340588 1284 1500984 0 0 23 83 60 42 1 9 1 89 7 0 0 341028 1320 1500904 0 0 396 860 1462 11664 7 85 0 8 5 0 0 341312 1784 1500624 0 0 892 412 714 18509 1 99 0 0 1 0 0 341900 1416 1500876 0 0 400 720 1444 11695 7 80 0 13 2 0 0 341712 1424 1500940 0 0 212 584 1240 9039 8 82 0 10 15 0 0 342324 1384 1501036 0 0 256 740 1505 8509 8 70 0 22 2 0 0 342524 1496 1500956 0 0 316 3312 1705 12072 9 76 0 15 Going to roll back to 20-15 kernel.
Misao, could you explain what is the problem with the vmstat you show in comment 87? (Maybe it's obvious to everybody else.) Did kswapd take all that system time, and continue to do that? Or is it something else you want to say?
This is definately kswapd going bonkers. The only thing this server does is qmail smtp sends, and they don't require this sort of overhead.
Is it possible to copy in the kernel-smp-unsupported rpm into http://people.redhat.com/~lwoodman/.RHEL3/ ? I'm seeing this same problem but with reiserfs
Rollback to 21-15 did not help. kswapd went nuts again, and caused the box to freeze up. Same symptoms as last time.
Misao, you must be experiencing a different problem than the other folks are here. Can you get your system into the "kswapd is going nuts" state and get me several "AltSysrq-W" and "AltSysrq-M" outputs so I can see whats going on? Larry
Created attachment 105864 [details] top view as system froze on 21-15 kernel top view as system froze on 21-15 kernel.
But does this fail on this kernel" >>>http://people.redhat.com/~lwoodman/.RHEL3/ ?
anielsen
A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-23.EL).
*** Bug 137513 has been marked as a duplicate of this bug. ***
Ernie/Larry, We've tested the patched kernel extensively and all seems to be working fine. Is the version mentioned by you officially supported and if so, where can we get it (2.4.21-23.EL) right now? We need to get our machine into production soon and would rather do it with a fully supported kernel.
I'm seeing some disturbing trend in our environment (approx. 40 RHEL servers, production CVS, ATG etc.). Over the last 3 days, 4 servers have keeled over. They've been running the 2.4.21-20.ELsmp kernel since 2004-10-09 (our last maintenance window) On one of the server I verified that kswapd is indeed consuming all CPU (top output): 11 root 25 0 0 0 0 SW 98.0 0.0 2133m 3 kswapd Is there an estimate when the patched kernel becomes available through RHN? Greetings, Ed.
The U4 kernel is scheduled to be released during the first week of December. The 2.4.21-22.EL kernel is the latest one in the RHN beta channel for RHEL3. The 2.4.21-23.EL kernel is undergoing internal Q/A at the moment, and I expect it to appear in the RHN beta channel within a few days. However, these kernels are not yet released, and therefore they are unsupported and unsuitable for production use. There will be at least one more respin before U4 is released.
"..they are unsupported and unsuitable for production use." I can't help myself, that is just too funny. I would say that 2.4.21-20.EL, whether supported or not, is "unsuitable for production use." Ed has had 10% of his 40 servers crash in the last few days, I've seen 8 out of 12 of my servers crashed at least once due to this bug, and a friend that works at a company with about 50 RHEL servers has had at least 15 servers hang with in what would appear to be the same manner. That's not very "suitable for production use" if you ask me, although it may be "supported." Seriously though, if you need this fix in a "supported" way now, you can probably just open a real support case with Redhat. Support will likely simply tell you to run the kernel available in this Bugzilla (perhaps even provide you a copy), and report any problems you have. They will then continue to support you even when running the unsupported kernel because you are running the unsupported kernel at the direction of support. Later, Tom
The test kernels mentioned in this bug appear to be no longer available. We've been seeing this problem as well, and older kernels have even more severe problems, so we're interested in getting a copy of the test kernel. Can it be made available again? And/or, when is it going to be released? As far as I can tell there is currently *no* stable kernel that's generally available for RHEL3.
John, please try the kernel in the RHN beta channels. Right now, 2.4.21-23.EL is there. Tonight we're building -24.EL, but it won't be available in the RHN beta channel until about a week from now.
Just as a point of curiosity, is anyone running the -23.EL kernels with success? I tested this kernel on two of our lab systems so far and both have experienced hard locks, appearing to be caused by the system going OOM. These machines were previously stable for weeks running -22_prune_icachefix.EL but the exact same load with -23.EL succeeded in hanging both of these systems in less than 24 hours. Perhaps it was a mistake in our testing and if others are seeing success I may try again and capture any information I can after a hang. Later, Tom
Tom: After being pointed at the beta channel yesterday, I installed 2.4.21-23.EL (hugemem) on a backup database server at about 9pm, with the intention of seeing if it was suitable for use on the primary. And this morning at 8am it was already hung--despite the fact that that machine was doing next to nothing at all. I don't know what specifically was causing the hang, since there was nothing in the logs and I wasn't even able to see the console messages, but I have to conclude that 2.4.21-23.EL is in fact worse than 2.4.21-20.EL.
Thanks for the input. Shortly after posting that note this morning I rebooted one of the two systems back to -23.EL but then had lots of meetings so didn't get to do anything with it. I came back just a few minutes ago to find a hung system. Before the hang the system was just sitting there idle at a login prompt. I think the -23 kernels have some serious issues. Has internal testing at Redhat turned up any such problems? Do we need to file different bugs against this issue. I certainly don't want to see yet another RHEL3 update kernel that is even more unusable than the current one. Later, Tom
Can everyone who is having a problem with .23 get me top, vmstat and AltSysrq M, W and T outputs so I can see where it is hanging? Larry Woodman
I'll be out of town until Wednesday but I will have physical access to a couple of servers this weekend that are very similar to the systems that are hanging with -23.EL in our lab. These systems have been running a slightly patched -20.EL (a patch for an unrelated problem - Bugzilla 118839) for 8+ weeks without issues so we'll see how they react with -23.EL and if I can reproduce the hang on those systems I'll send you the output you requested. Later, Tom
Now we have tested the 2.4.21-23.ELsmp, and don't see any problem with it. I conclude that the bug I originally reported is solved. I obviously can't answer for all the people who have been listening and commenting here. And, obviously again, there are some kind of problems with this kernel too. There isn't really anything which say they are related, though. So I close this report now since the issue I reported is solved. I encourage you Tom to start a separate case for the problems you are seeing. After all, if you started seeing hangs in the current kernel and did a bugzilla search, would you read all the way down to comment 110 to find that there it was discussed? :-) But please put a note here what bug number you get, for those of us who want to follow that case too. Even if we haven't seen any locks here (yet), I would like to follow that discussion.
Göran, thanks for the verification. I'm reverting the state of this bug to MODIFIED, since U4 is not quite yet released. The Errata System will change it to CLOSED/ERRATA automatically when U4 becomes available in the main RHN channel.
:-) I used the resolution "NEXTRELEASE" exactly because U4 isn't available yet. It will be the "next release"! :-) Of course, it doesn't really matter to me. This bug is fixed in this kernel, that is what I really care about. And maybe "next release" is meant to refer to RHEL4.
Your last statement is correct: "next release" refers to RHEL4.
At this time I would agree that the lockup I'm seeing with -23.EL may indeed be yet another, different issue. I will continue to test and gather information and open an new Bugzilla if I can make the problem reproducible (I'm only 2 for 3 so far). Later, Tom
Tom, if you can get me some AltSysrq M, T and W outputs when the lockup occurs I can help you. Larry Woodman
If you create a new bug for the -23.EL lockup, please add the bug number to this case, so we can follow it :).
Larry, I have no doubt you can help me once I gather the information you requested, however, I've been at another office this week and could not reproduce the issue on either of the two servers I was testing on this weekend, even though they are practically identical to the systems in our lab that both experienced a problem. I am back to my normal office this week and I will try my best to find time to attempt reproducing this issue on the lab servers that experienced the problems last week. Thanks for your help. Later, Tom
Tom, the latest kernel in the RHN beta channel is now 2.4.21-25.EL. Please use that version for any future testing. Thanks.
When will an official release be available for this bug?
John, 1st (full) week in December.
RedHat Team -- I'm having an issue that seems like it might be related to this, but I want to confirm that it is before I try the beta kernel. Here's a simple explanation: If I run tar to extract a large tar archive (about 30 gigabytes) full of small (100k or so) files most of the system memory (6 GB) gets used up as buffers/cache (as it should), however, kswapd goes nuts using almost all of the CPU and slowing down everything, including the tar process. It's worth noting that the used swap memory does not go up (it's at about 7 megs and just staying there), so kswapd is obviously being ineffective. From what I can tell, kswapd is just sitting there churning up CPU for no good reason. Does this sound like a match?
Interestingly enough, suspending the tar process does NOT help the problem. The free physical memory went up to 121 megs. Not counting buffers and cache, there's 3738 free. However, kswapd is still taking 99% of the CPU. There doesn't seem to be anything I can do to stop it, other than reboot.
Tolga, this sounds exactly like the problem that we fixed in this bug report. Please try out the latest RHEL3-U4 kernel. Larry Woodman
Larry, our stress test: AMOUNT=200 DIR=/tmp/foo mkdir $DIR while true; do cd $DIR for i in $(seq 1 $AMOUNT); do mkdir $i cd $i tar xf /tmp/linux-2.4.27.tar cd $DIR done for i in $(seq 1 $AMOUNT); do echo -n "$i: " >> md5sum.log find $i -type f -print0 | xargs -0 cat | md5sum >> md5sum.log done for i in $(seq 1 $AMOUNT); do echo -n "$i: " >> md5sum2.log find $i -type f -print0 | xargs -0 cat | md5sum >> md5sum2.log done date >> filefill.log df -m . >> filefill.log for i in $(seq 1 $AMOUNT); do rm -rf ./"$i" done done & while true; do updatedb; done & runs pretty fine with 2.4.21-25ELsmp (2x Xeon 3GHz HT) - it used to guaranteedly cause the kswapd problem - but with the UP kernel (2.4.21-25EL) it slows down to making almost no progress. It doesn't cease completely nor lock up, though. On a related note, even the SMP (2.4.21-25ELsmp) kernel is orders of magnitude slower than 2.6.10rc2. Probably due to the vastly better anticipatory IO scheduler 2.6 has...
Larry, Thanks, installing 2.4.21-25.ELsmp from the beta channel _completely_ eliminated the problem for us. Just out of curiosity, was this bug inherited from the upstream kernel code, or is it specific to the RH kernels?
Tolga, I guess I can take most of the blame for these bugs, they were in part inherited from the 2.4-rmap VM and in part RHEL3 specific. The reason for that is that the 2.4 upstream VM simply does not run at all on the larger configurations supported by RHEL3, which meant we had to go with a different VM. Luckily the VM in the upstream 2.6 kernel does seem to scale well enough, so we won't need to make as many changes for RHEL4 ;)
No, this bug was actually introduced by Red Hat in the RHEL3-U3 kernel. The problem started when this patch was added to RHEL3-U3 to prevent refiling inodes that are I_LOCK'd" ***************************************************************** --- linux-2.4.21/fs/inode.c.orig +++ linux-2.4.21/fs/inode.c @@ -285,7 +285,8 @@ void refile_inode(struct inode *inode) BUG(); */ if (!inode) return; spin_lock(&inode_lock); - __refile_inode(inode); + if (!(inode->i_state & I_LOCK)) + __refile_inode(inode); spin_unlock(&inode_lock); } ********************************************************************* Since prune_icache sets I_LOCK before calling invalidate_inode_pages, the inode never gets refiled and therefore never moved to the inode_unused list, kswapd never makes progress freeing up unused inodes and the list just keeps growing! Larry
Hello, I am an escalation engineer for the Proliant line at HP. I have a customer and have duplicated a similar issue. The issue shows up with NFS under heavy I/O. Can I get a link to the beta kernel fix you are discusing.
Ron, do you have an RHN account? That's where the RHEL beta packages are made available.
Can someone from RedHat please post this 2.4.21-25.ELsmp U4 beta kernel somewhere. I do not have a RHN account since we are using a satellite server here which currently does not support synching the beta channels.
What archs are you looking to test?
I only need an rpm for i686 SMP. - Thanks.
You will find the kernel you need here: http://people.redhat.com/peterm/.132639/ We are rebuilding the U4 GA kernel again tonight, but the kernel you will download has the fix(s) you need.
Kernel 2.4.21-26.EL fixes the problem I had with RHEL3 U3. I'm setting up a new fileserver which replaces our old kernel 2.2 based server. The new highend box connected to a SAN (HP EVA3000) just didn't manage to rsync the files from the old server to the SAN.
Created attachment 108212 [details] My Complaint about RH Enterprise Service.
So does 2.4.21-0.1. that was just released a week ago fix this issue, or do I still need to install the "beta" kernel? Please let me know. Thanks, Matt
2.4.21-0.1 doesn't fix this issue! Only 2.4.21-26.EL beta does.
What was released a week+ ago was 2.4.21-20.0.1.EL, which was a security errata release based on the U3 kernel. The fix for this bug is in the U4 kernel (2.4.21-27.EL), which is scheduled for release a week from today.
One of the machines running 2.4.21-23 got a kernel panic today. It looks like related to kswapd and possibly related to this bugzilla, but I don't know if this is already fixed in later kernels. I will attach the last console output.
Created attachment 108467 [details] console output from kernel panic
I have tested both 2.4.21-23 and 2.4.21-26 and can confirm once again that this issue is not resolved. I simply run two copies of tar (creating a backup of the whole OS) to a second disk /backup, (excluding /proc /backup etc). I let the first copy run for a few mins until most of the memory is used by disk cache, then i start the second copy and BANG! 2.0G sysback-test1.tar.gz <-- 1st copy 208M sysback-test2.tar.gz <-- 2rd copy I tested this on two different machines now, one with 4GB and one with 8GB - with the same results. - i.e. a panic. All of my tests work flawlessly using 2.4.21-15.0.4.ELsmp on both of the above machines. If fact I can run as many copies of tar as I like without crashing this kernel - here is the kernel panic screen shoot of this panic.
Created attachment 108667 [details] Kernel 2.4.21-23.ELsmp Panic under load
Can someone please point me to the latest beta kernel > 2.4.21-26 so I can test it under load. Thanks
I've just released that I also had a TOP view running on one of my consoles at the time of the hang if it helps: 362 processes: 360 sleeping, 2 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 95.6% 0.0% 9.6% 0.4% 4.0% 191.2% 97.6% cpu00 26.6% 0.0% 3.1% 0.7% 1.9% 62.4% 4.9% cpu01 22.3% 0.0% 2.1% 0.0% 0.1% 69.7% 5.5% cpu02 19.3% 0.0% 2.3% 0.0% 0.3% 32.0% 45.8% cpu03 27.6% 0.0% 1.9% 0.0% 1.7% 27.0% 41.5% Mem: 4100856k av, 4082228k used, 18628k free, 0k shrd, 209688k buff 1307848k actv, 2043932k in_d, 63532k in_c Swap: 1959920k av, 0k used, 1959920k free 2732176k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 8596 root 25 0 688 672 268 S 60.4 0.0 9:12 2 gzip 4369 tomcat 25 0 932M 932M 18204 S 33.4 23.2 10:54 2 java 12339 root 15 0 1128 1112 760 D 3.9 0.0 0:07 1 tar 8594 root 15 0 1128 1112 740 D 2.1 0.0 1:29 2 tar 11 root 15 0 0 0 0 SW 1.1 0.0 0:07 2 kswapd 12341 root 25 0 680 664 272 S 0.5 0.0 0:55 1 gzip 12874 root 16 0 1660 1648 912 R 0.5 0.0 0:00 0 top 3468 root 16 0 0 0 0 SW 0.3 0.0 0:04 2 kjournald 4285 named 25 0 4644 4644 2104 S 0.3 0.1 0:16 2 named 1 root 15 0 888 880 468 S 0.0 0.0 0:05 2 init 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0 3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1 4 root RT 0 0 0 0 SW 0.0 0.0 0:00 2 migration/2 5 root RT 0 0 0 0 SW 0.0 0.0 0:00 3 migration/3 6 root 15 0 0 0 0 SW 0.0 0.0 0:00 3 keventd
While i'm at it here is some SAR output, interestingly it shows >100% in some columns? 12:00:00 AM CPU %user %nice %system %idle 12:10:00 AM all 6.70 0.00 0.87 92.43 12:20:00 AM all 4.16 0.00 0.63 95.21 12:40:00 AM all 100.01 0.00 0.00 0.00 12:50:00 AM all 5.42 0.00 0.75 93.83 01:00:01 AM all 17.54 0.00 2.25 80.21 01:10:00 AM all 22.94 0.00 4.47 72.58 01:50:00 AM all 100.01 0.00 100.01 0.00 02:00:01 AM all 1.77 0.00 0.34 97.89 02:10:00 AM all 0.07 0.00 0.13 99.80 02:20:01 AM all 0.08 0.00 0.12 99.81 02:30:00 AM all 0.06 0.00 0.09 99.85 02:40:00 AM all 0.07 0.00 0.12 99.81 Average: all 363426.05 0.00 0.33 100.44 I thought only M$ scandisk was the only system that could calculate >100% :)
Albert, the U4 beta kernel (which we expect to be released on Monday) can be found here: ftp://partners.redhat.com/a61d109e2483b0bf579b0b5f90a5ea8c/2.4.21-27.EL/ If you have a problem with that kernel, please open a new bug. This one will be automatically closed when U4 is released. Thanks in advance.
Received report of behavior possibly relevant to this bug, pasting it here and also attaching files... One of our servers was again in "kswapd overload mode". It was running the 2.4.21-20.0.1.ELsmp kernel, but we have seen the same / similar behavior with the beta (2.4.21-25.ELsmp) as well. I was able to run a couple of commands while the system was having problems and will attach them. Characteristics of servers: - Dell PowerEdge/redhat RHEL3 servers - many file systems (25+ disk partitions, approx 35 - 50 GB each, ext3) - lots of files / directories have been opened / read (updatedb and large Maildir mailboxes) - lots of files / directories monitored by FAM (large IMAP Maildir mailboxes) - 4 GB of RAM in servers - NFS services enabled Problem can be reproduced by running a stress test. It simply tar's to /dev/null of all file systems in parallel and executes the /etc/cron.daily/slocate.cron (updatedb) script as well. One server (running 2.4.21-25.ELsmp beta kernel) wedged after about an hour. In this state, the system refuses any remote connections e.g. ssh (although it takes your login and password and then hangs) and the console goes completely dead. I have seen similar behavior with the 2.4.21-15.ELsmp kernel as well. Ran this same test on our another server with the 2.4.21-4.ELsmp kernel. It did not seem to be affected though the hardware is slightly different.
Created attachment 108693 [details] From Comment 152
Created attachment 108694 [details] re: Comment 152
Chris, can you get me a few AltSysrq-W outputs when the system is in the state where kswapd is eating up the cputime? I did fix a problem in U4 where inodes were nog getting refiled therefore they were not getting reclaimed by kswapd but perhaps you are having a different problem. Larry Woodman
ok. system has been in basically idle state overnight and kswapd has crept up to ~35% of CPU: (top cmd output)-------------------------------------------- 07:45:50 up 19:09, 3 users, load average: 1.49, 1.41, 1.33 140 processes: 138 sleeping, 2 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 1.2% 0.0% 38.0% 0.0% 0.0% 23.6% 336.0% cpu00 1.5% 0.0% 0.7% 0.0% 0.0% 0.1% 97.4% cpu01 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 100.0% cpu02 0.0% 0.0% 1.3% 0.0% 0.0% 14.5% 84.0% cpu03 0.0% 0.0% 35.9% 0.0% 0.0% 9.1% 54.8% Mem: 3598952k av, 1376124k used, 2222828k free, 0k shrd, 752392k buff 748496k actv, 309236k in_d, 76616k in_c Swap: 4096532k av, 0k used, 4096532k free 487216k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 11 root 16 0 0 0 0 RW 37.3 0.0 135:35 2 kswapd 2946 gdm 15 0 19328 18M 7512 S 0.9 0.5 0:54 0 gdmgreeter 2593 root 15 0 21676 12M 2364 S 0.5 0.3 5:16 1 X 11250 root 15 0 1284 1276 892 R 0.3 0.0 0:00 0 top 1581 root 25 0 8420 8392 3500 S 0.1 0.2 5:30 3 dcstor32d 2022 root 19 0 572 572 428 S 0.1 0.0 0:00 1 racsrvc 1 root 15 0 920 912 464 S 0.0 0.0 0:06 0 init (alt-sysrq-w and alt-sysrq-m output)-------------------------------- Dec 17 07:42:28 platoon kernel: SysRq : Show CPUs Dec 17 07:42:28 platoon kernel: CPU3: Dec 17 07:42:28 platoon kernel: c6b7de8c 00000003 c01ce8ef 00000000 c03b2d34 00000077 00000006 c01ceb7a Dec 17 07:42:28 platoon kernel: 00000077 c6b7df7c c04f74c4 f476d000 f476d000 c04f74c4 c6b7df7c c01ceadd Dec 17 07:42:28 platoon kernel: 00000077 c6b7df7c c04f74c4 f476d000 00000011 00000002 00000003 c6b7df7c Dec 17 07:42:28 platoon kernel: Call Trace: [<c01ce8ef>] sysrq_handle_showcpus [kernel] 0xf (0xc6b7de94)Dec 17 07:42:28 platoon kernel: [<c01ceb7a>] __handle_sysrq_nolock [kernel] 0x7a (0xc6b7dea8) Dec 17 07:42:28 platoon kernel: [<c01ceadd>] handle_sysrq [kernel] 0x5d (0xc6b7dec8) Dec 17 07:42:28 platoon kernel: [<c01cc61b>] handle_scancode [kernel] 0x2ab (0xc6b7deec) Dec 17 07:42:28 platoon kernel: [<c01cd7fd>] handle_kbd_event [kernel] 0xbd (0xc6b7df14) Dec 17 07:42:28 platoon kernel: [<c01cd81c>] keyboard_interrupt [kernel] 0x1c (0xc6b7df2c) Dec 17 07:42:28 platoon kernel: [<c010dd39>] handle_IRQ_event [kernel] 0x69 (0xc6b7df30) Dec 17 07:42:28 platoon kernel: [<c010df79>] do_IRQ [kernel] 0xb9 (0xc6b7df50) Dec 17 07:42:28 platoon kernel: [<c010dec0>] do_IRQ [kernel] 0x0 (0xc6b7df74) Dec 17 07:42:28 platoon kernel: [<c0109100>] default_idle [kernel] 0x0 (0xc6b7df7c) Dec 17 07:42:28 platoon kernel: [<c0109100>] default_idle [kernel] 0x0 (0xc6b7df90) Dec 17 07:42:28 platoon kernel: [<c0109129>] default_idle [kernel] 0x29 (0xc6b7dfa4) Dec 17 07:42:28 platoon kernel: [<c01091c2>] cpu_idle [kernel] 0x42 (0xc6b7dfb0) Dec 17 07:42:28 platoon kernel: [<c01295e3>] printk [kernel] 0x153 (0xc6b7dfcc) Dec 17 07:42:28 platoon kernel: Dec 17 07:42:28 platoon kernel: CPU2: Dec 17 07:42:28 platoon kernel: c6b7ff64 00000002 c011c91f 00000000 00001f7c c03f2caa c0109100 00000000 Dec 17 07:42:28 platoon kernel: c6b7e000 c6b7e000 c6b7e000 c0109100 00000000 00000068 00000068 fffffffb Dec 17 07:42:28 platoon kernel: c0109129 00000060 00000246 c01091c2 0602080b 00000000 00000000 00000000 Dec 17 07:42:28 platoon kernel: Call Trace: [<c011c91f>] smp_call_function_interrupt [kernel] 0x2f (0xc6b7ff6c) Dec 17 07:42:28 platoon kernel: [<c0109100>] default_idle [kernel] 0x0 (0xc6b7ff7c) Dec 17 07:42:28 platoon kernel: [<c0109100>] default_idle [kernel] 0x0 (0xc6b7ff90) Dec 17 07:42:28 platoon kernel: [<c0109129>] default_idle [kernel] 0x29 (0xc6b7ffa4) Dec 17 07:42:28 platoon kernel: [<c01091c2>] cpu_idle [kernel] 0x42 (0xc6b7ffb0) Dec 17 07:42:28 platoon kernel: [<c01295e3>] printk [kernel] 0x153 (0xc6b7ffcc) Dec 17 07:42:28 platoon kernel: Dec 17 07:42:28 platoon kernel: CPU0: Dec 17 07:42:28 platoon kernel: c03f1f88 00000000 c011c91f 00000000 00001fa0 c03f2caa c0109100 c5266b00 Dec 17 07:42:28 platoon kernel: c03f0000 c03f0000 c03f0000 c0109100 00000000 00000068 00000068 fffffffb Dec 17 07:42:28 platoon kernel: c0109129 00000060 00200246 c01091c2 0002080b 00099800 c0107000 0008e000 Dec 17 07:42:28 platoon kernel: Call Trace: [<c011c91f>] smp_call_function_interrupt [kernel] 0x2f (0xc03f1f90) Dec 17 07:42:28 platoon kernel: [<c0109100>] default_idle [kernel] 0x0 (0xc03f1fa0) Dec 17 07:42:28 platoon kernel: [<c0109100>] default_idle [kernel] 0x0 (0xc03f1fb4) Dec 17 07:42:28 platoon kernel: [<c0109129>] default_idle [kernel] 0x29 (0xc03f1fc8) Dec 17 07:42:28 platoon kernel: [<c01091c2>] cpu_idle [kernel] 0x42 (0xc03f1fd4) Dec 17 07:42:28 platoon kernel: [<c0107000>] stext [kernel] 0x0 (0xc03f1fe0) Dec 17 07:42:28 platoon kernel: Dec 17 07:42:28 platoon kernel: CPU1: Dec 17 07:42:28 platoon kernel: c6979f64 00000001 c011c91f 00000000 00001f7c c03f2caa c0109100 c043b280 Dec 17 07:42:28 platoon kernel: c6978000 c6978000 c6978000 c0109100 00000000 00000068 00000068 fffffffb Dec 17 07:42:28 platoon kernel: c0109129 00000060 00200246 c01091c2 0102080b 00000000 00000000 00000000 Dec 17 07:42:28 platoon kernel: Call Trace: [<c011c91f>] smp_call_function_interrupt [kernel] 0x2f (0xc6979f6c) Dec 17 07:42:28 platoon kernel: [<c0109100>] default_idle [kernel] 0x0 (0xc6979f7c) Dec 17 07:42:28 platoon kernel: [<c0109100>] default_idle [kernel] 0x0 (0xc6979f90) Dec 17 07:42:28 platoon kernel: [<c0109129>] default_idle [kernel] 0x29 (0xc6979fa4) Dec 17 07:42:28 platoon kernel: [<c01091c2>] cpu_idle [kernel] 0x42 (0xc6979fb0) Dec 17 07:42:28 platoon kernel: [<c01292b3>] call_console_drivers [kernel] 0x63 (0xc6979fc4) Dec 17 07:42:28 platoon kernel: [<c01295e3>] printk [kernel] 0x153 (0xc6979ffc) Dec 17 07:42:28 platoon kernel: Dec 17 07:42:35 platoon kernel: SysRq : Show Memory Dec 17 07:42:35 platoon kernel: Mem-info: Dec 17 07:42:35 platoon kernel: Zone:DMA freepages: 2495 min: 0 low: 0 high: 0 Dec 17 07:42:35 platoon kernel: Zone:Normal freepages: 1838 min: 1279 low: 4544 high: 6304 Dec 17 07:42:35 platoon kernel: Zone:HighMem freepages:552040 min: 255 low: 10750 high: 16125 Dec 17 07:42:35 platoon kernel: Free pages: 556373 (552040 HighMem) Dec 17 07:42:35 platoon kernel: ( Active: 186416/76696, inactive_laundry: 41103, inactive_clean: 19194, free: 556373 ) Dec 17 07:42:35 platoon kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:2495 Dec 17 07:42:35 platoon kernel: aa:0 ac:152947 id:0 il:34793 ic:2482 fr:1838 Dec 17 07:42:35 platoon kernel: aa:14247 ac:19222 id:76696 il:6310 ic:16712 fr:552040 Dec 17 07:42:35 platoon kernel: 1*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 2*4096kB = 9980kB) Dec 17 07:42:35 platoon kernel: 536*4kB 23*8kB 6*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 7352kB) Dec 17 07:42:35 platoon kernel: 684*4kB 1220*8kB 801*16kB 516*32kB 359*64kB 241*128kB 126*256kB 61*512kB 17*1024kB 4*2048kB 494*4096kB = 2208160kB) Dec 17 07:42:35 platoon kernel: Swap cache: add 0, delete 0, find 0/0, race 0+0 Dec 17 07:42:35 platoon kernel: 14032 pages of slabcache Dec 17 07:42:35 platoon kernel: 308 pages of kernel stacks Dec 17 07:42:35 platoon kernel: 0 lowmem pagetables, 708 highmem pagetables Dec 17 07:42:35 platoon kernel: Free swap: 4096532kB Dec 17 07:42:35 platoon kernel: 917472 pages of RAM Dec 17 07:42:35 platoon kernel: 688096 pages of HIGHMEM Dec 17 07:42:35 platoon kernel: 17734 reserved pages Dec 17 07:42:35 platoon kernel: 261654 pages shared Dec 17 07:42:35 platoon kernel: 0 pages swap cached
p.s. forgot to mention, system is running beta kernel: [root@platoon log]# uname -a Linux platoon 2.4.21-25.ELsmp #1 SMP Fri Nov 12 21:34:51 EST 2004 i686 i686 i386 GNU/Linux
Why don't you test with 2.4.21-26.EL or newer, which has fixed it for me?
wow! these kernels are coming out like hotcakes. Well, I have been working with the redhat support folks and the 2.4.21-25.ELsmp beta kernel was the one they pointed me to. I will check with them to get the appropriate one. thanks, John.
I think whats going on here is that updatedb runs over night and basically touches every file on your systems mouted disk. This results in inode cache growing and consuming most of the system memory. You should be able to simulate this by running "du /". You should see the inode and dentry caches growing(in /proc/slabinfo and /proc/sys/fs/inode-nr). When these things run kswapd comps away at the system memory in order to fill the inode and dentry cache requests. I am working on providing upper limits to the size of the inode and dentry caches but it wont be complete in time for RHEL3-U4. While this is inconvenient kswapd wont kill your system since RHEL3-U4 containes the prune_icache fix that allows the inodes to be rapidly reclaimed. Larry Woodman
In my stress test (original report: https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=105430&action=view - the symptoms were different for latter RHEL3 beta kernels that fixed the problem at least partly), I had updatedb running as well. It was definetely the kernel dentry/inode cache filling up. When I watched slabinfo during the stress test, it was dentry cache, not inode cache that filled up. (Does updatedb actually need to open inodes?) This, btw, has been a problem for 2.4 kernel on large memory machines for a long time: I recall a locking up Itanium / 2GB box from a couple of years a ago, where the problem was a indeed kernel's unability to reap the cache.
Ville, usually dentrys and inodes are one to one. The dentry code keeps a reference to the inode so its not possible to free an inode until the dentry has been free. Try running "du /" and watching the dentry and inode caches(and what happens to the pagecache and anonymous memory). I am currently working on a patch that will limit the size of both the dentry and inode caches so that if you try to allocate above some number you will be forced free some up. Larry
Just to follow up, we first experienced the kswapd problem while copying large file systems from one disk to another. The kswap load got so high that the server became unfunctional due to extreme slowness. We also discovered that this bug appears when updatedb and fam is running, along with high amounts of file related activity e.g. file system tars. We have been using stress tests that involve starting updatedb and running multiple parallel copies (tar) to /dev/null. Not only do we get the kswapd problem, the system finally gets to a point where it completely locks up and does not respond to remote or console logins and has to be rebooted. I am able to do alt-sysrq commands but they are not logged to messages files. However, we do see many errors in log files of type: -------------------------------------------------------------- Dec 17 11:59:56 platoon kernel: scsi : aborting command due to timeout : pid 34682449, scsi0, channel 0, id 0, lun Dec 17 13:07:19 platoon kernel: blk: queue e6c4d018, I/O limit 4095Mb (mask 0xff ffffff) ---------------------------------------------------------------- The general configuration of our servers are: - many file systems (25+ disk partitions, approx 35 - 50 GB each, ext3) - lots of files / directories have been opened / read (updatedb and large Maildir mailboxes) - lots of files / directories monitored by FAM (large IMAP Maildir mailboxes) - 4 GB of RAM in servers - NFS service enabled - smb (samba) service enabled
Is it just me, or is this pretty lame that this problem has existed since mid September on an "enterprise class" os and still isn't fixed. Obviously it affects a LOT of people. I know they are slowly woring on it, but it isn't good enough for me. If I pay $1499 a year for AS I better have a functional product. I don't want to run an old kernel that has tons of security problems simply so that my machines will stay up. Just my 2 cents worth, but that is why all of redhats kernel "enhancements" are iffy at best. Since the community at large doesn't test them, the paying customer is the guini(sp?) pig. I know the engineers are doing their best and I REALLY do appreciate. I just wish I had a solution after several months. Matt
Like I said earlier in this bug there was a change made to RHEL3-U3 that caused kswapd to fail to reclaim inodes because they did not get refiled due the the I_LOCK being set, it was this patch: @@ -285,7 +285,8 @@ void refile_inode(struct inode *inode) BUG(); */ if (!inode) return; spin_lock(&inode_lock); - __refile_inode(inode); + if (!(inode->i_state & I_LOCK)) + __refile_inode(inode); spin_unlock(&inode_lock); } So, in RHEL3-U4 I changed prune_icache to refile the inode after clearing the I_LOCK state. Several people tested that patch and said that it fixed the problem of kswapd consuming the system in prune_icache() ant that patch is in RHEL3.U4. Evidently there is some other problem that is preventing inodes form getting reclaimed in this case. First of all is everyone running the latest RHEL3-U4 kernel-2.4.21-27.EL? and can someone please get me several AltSysrq-M and AltSysrq-W outputs when this happens? Larry Woodman
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html
Larry, We are experiencing the same problem with 2.4.21-20.ELsmp the kswapd occupies 130% of the CPU and hangs the system. We applied the 2.4.21-27.ELsmp and did not fix the problem reverted back to 2.4.21-20.ELsmp. where can can we get the 2.4.21-26.ELsmp version for this problem. Thanks kal