From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3) Gecko/20040803 Description of problem: When reading a tape with 1.261.806 files (through cpio), some large, mostly small, at a certain point - about halfway i'd say - kswapd jumps up to 100 percent CPU, system is completely frozen, and kswapd NEVER recovers. System is even to slow to take keyboard input so hard reboot is only option. While reading the first quarter of the tape _all_ the memory (4Gb) is used (as cache). I've tried changing vm.pagecache without effect. RedHat 9.2 (from which I upgraded) and 2.6.8 kernel on the same RHEL machine do not have this problem. Neither does Fedora (also with 2.6.8 kernel). The problem always occurs (tried it 5 times). Version-Release number of selected component (if applicable): kernel-2.4.21-20.EL How reproducible: Always Steps to Reproduce: 1. Read tape with 1 million files on machine with 4Gb RAM. Additional info:
Roberto, for starters can you get me several top, AltSysrq-M and AltSysrq-W outputs when your system is in this state. Thanks, Larry Woodman
Larry, I don't think I manage to do that. The server is back at the coloc, running Fedora 2 now (already had one server crash but that is another story). I will see if I have time to run this on my development server (which has only 3Gig mem AND slightly less fast harddisks, which might make a difference), but that'll be next week the earliest. Seems bug #124058 might be related.
Roberto, when you get a chance can you test out the fix for this problem? Its located in: >>>>http://people.redhat.com/~lwoodman/.RHEL3/kernel-smp-2.4.21-22.prune_icachefix.EL.i686.rpm Thanks, Larry
Created attachment 105537 [details] captured sysrq data during various times This is the sysrq information as tiome stamped on the system.
Created attachment 105540 [details] Screen Captures of TOP This is the screen capture of TOP and some other comments.
I have exactly the same kswapd issue and have documentation attached. Please look at this as soon as possible as system is not useable. All I have to do is restore many files from tape using cpio, or copy files with scp or rcp accross the network.
Don, this is with the latest kernel I posted? Also, please get me a few AltSysrq-T outputs so I can see where kswapd is hanging out. Larry
Larry: I don't know about kernels that you have posted. I have been around Unix a long time but Redhat is somewhat new to me. I only have what up2date would have provided. Kernel 2.4.21-20.ELsmp. Can you give me more instructions on the AltSysrq-T? I will have to start the test up again as the system has been re- booted.
Don, can you grab this kernel and give it a try? >>>>http://people.redhat.com/~lwoodman/.RHEL3/kernel-smp-2.4.21-22.prune_icachefix.EL.i686.rpm Larry
Larry: We have installed your prune_icachefix Kernel and run our cpio test. The system never became bogged down by the kswapd daemon. The system load seemd a bit high for the actual work being done, around 1.4. When can we expect a production fix and release of this Kernel. Thanks for your help on this matter. Don Lewis
I have also tried the prune_icachefix kernel, and a simple perl script fetching images from a webcam and analyzing it with GD, that runs very steady on the 2.4.21-20.ELsmp kernel, now eats memory like crazy, expanding to over 5 Gb within half an hour. So I am afraid there are some real serious side-effects to your solution.
A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-23.EL).
Hi there. Is the patch for this issue generally available ? I too have the same problem with kswapd taking down my systems (built several new servers for a critical project with RHEL AS 3.0 U3). The link to the patch ealier in this thread is no longer valid and I need to get my systems operational. THANKS !!! Joe
The fix is in the latest U4 kernel, which is in beta test right now (and is available in the RHN beta channel). However, there will be another respin next week, so the -23.EL kernel is not exactly what will be released in Update 4. I would advise waiting until the final U4 is released (beginning of December).
Unfortunately I can't wait a month for this... when I look at the RHN AS3 Update 4 beta channel these are the only kernel pachages I see. glibc-kernheaders-2.4-9.1.87.i386.rpm kernel-2.6.9-1.648_EL.i586.rpm kernel-2.6.9-1.648_EL.src.rpm kernel-smp-2.6.9-1.648_EL.i586.rpm kernel-utils-2.4-13.1.37.i386.rpm Are those the correct kernel packages ? Thanks again, Joe
Joe, you're looking in the wrong channel. The kernel version is 2.4.21-23.EL, and 2.4.21-24.EL will be built tonight (but won't be available in RHN for about a week).
Ah ! I found the right location this time. Thanks for your help with this. Joe
Ernie, For our internal policy we must build custom kernel. We have the same problem that has been described. Please, could you give us direct link to the patch to fix this problem?
Another variant: please, give tip how can I find src.rpm of the kernel with fix of the kswapd problem in RHN? I looked at it but did not see any way to find src.rpm :(
The relevant RPM is kernel-source-2.4.21-27.EL.i386.rpm, which can be found in the i386 subdirectory of the following URL: ftp://partners.redhat.com/a61d109e2483b0bf579b0b5f90a5ea8c/2.4.21-27.EL/ The kernel (along with the rest of U4) is scheduled for release on 20-Dec-2004, at which time you will be able to find it in the main RHN channel(s).
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html