|Summary:||cpio with many files flips kswapd, system hangs|
|Product:||Red Hat Enterprise Linux 3||Reporter:||Roberto Bourgonjen <otrebor>|
|Component:||kernel||Assignee:||Larry Woodman <lwoodman>|
|Status:||CLOSED ERRATA||QA Contact:|
|Version:||3.0||CC:||dlewis, joe, petrides, redhat-bugzilla, riel, say, tao|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2004-12-20 20:56:40 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Roberto Bourgonjen 2004-09-22 09:08:01 UTC
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3) Gecko/20040803 Description of problem: When reading a tape with 1.261.806 files (through cpio), some large, mostly small, at a certain point - about halfway i'd say - kswapd jumps up to 100 percent CPU, system is completely frozen, and kswapd NEVER recovers. System is even to slow to take keyboard input so hard reboot is only option. While reading the first quarter of the tape _all_ the memory (4Gb) is used (as cache). I've tried changing vm.pagecache without effect. RedHat 9.2 (from which I upgraded) and 2.6.8 kernel on the same RHEL machine do not have this problem. Neither does Fedora (also with 2.6.8 kernel). The problem always occurs (tried it 5 times). Version-Release number of selected component (if applicable): kernel-2.4.21-20.EL How reproducible: Always Steps to Reproduce: 1. Read tape with 1 million files on machine with 4Gb RAM. Additional info:
Comment 1 Larry Woodman 2004-09-23 21:06:53 UTC
Roberto, for starters can you get me several top, AltSysrq-M and AltSysrq-W outputs when your system is in this state. Thanks, Larry Woodman
Comment 2 Roberto Bourgonjen 2004-09-24 14:08:30 UTC
Larry, I don't think I manage to do that. The server is back at the coloc, running Fedora 2 now (already had one server crash but that is another story). I will see if I have time to run this on my development server (which has only 3Gig mem AND slightly less fast harddisks, which might make a difference), but that'll be next week the earliest. Seems bug #124058 might be related.
Comment 3 Larry Woodman 2004-10-20 18:56:20 UTC
Roberto, when you get a chance can you test out the fix for this problem? Its located in: >>>>http://people.redhat.com/~lwoodman/.RHEL3/kernel-smp-2.4.21-22.prune_icachefix.EL.i686.rpm Thanks, Larry
Comment 4 Don Lewis 2004-10-20 19:08:58 UTC
Created attachment 105537 [details] captured sysrq data during various times This is the sysrq information as tiome stamped on the system.
Comment 5 Don Lewis 2004-10-20 19:10:57 UTC
Created attachment 105540 [details] Screen Captures of TOP This is the screen capture of TOP and some other comments.
Comment 6 Don Lewis 2004-10-20 19:15:54 UTC
I have exactly the same kswapd issue and have documentation attached. Please look at this as soon as possible as system is not useable. All I have to do is restore many files from tape using cpio, or copy files with scp or rcp accross the network.
Comment 7 Larry Woodman 2004-10-20 19:53:02 UTC
Don, this is with the latest kernel I posted? Also, please get me a few AltSysrq-T outputs so I can see where kswapd is hanging out. Larry
Comment 8 Don Lewis 2004-10-20 20:10:35 UTC
Larry: I don't know about kernels that you have posted. I have been around Unix a long time but Redhat is somewhat new to me. I only have what up2date would have provided. Kernel 2.4.21-20.ELsmp. Can you give me more instructions on the AltSysrq-T? I will have to start the test up again as the system has been re- booted.
Comment 9 Larry Woodman 2004-10-20 20:28:47 UTC
Don, can you grab this kernel and give it a try? >>>>http://people.redhat.com/~lwoodman/.RHEL3/kernel-smp-2.4.21-22.prune_icachefix.EL.i686.rpm Larry
Comment 10 Don Lewis 2004-10-22 12:15:30 UTC
Larry: We have installed your prune_icachefix Kernel and run our cpio test. The system never became bogged down by the kswapd daemon. The system load seemd a bit high for the actual work being done, around 1.4. When can we expect a production fix and release of this Kernel. Thanks for your help on this matter. Don Lewis
Comment 11 Roberto Bourgonjen 2004-10-27 13:30:55 UTC
I have also tried the prune_icachefix kernel, and a simple perl script fetching images from a webcam and analyzing it with GD, that runs very steady on the 2.4.21-20.ELsmp kernel, now eats memory like crazy, expanding to over 5 Gb within half an hour. So I am afraid there are some real serious side-effects to your solution.
Comment 12 Ernie Petrides 2004-10-28 23:39:45 UTC
A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-23.EL).
Comment 13 Joe Goyette 2004-11-05 20:32:32 UTC
Hi there. Is the patch for this issue generally available ? I too have the same problem with kswapd taking down my systems (built several new servers for a critical project with RHEL AS 3.0 U3). The link to the patch ealier in this thread is no longer valid and I need to get my systems operational. THANKS !!! Joe
Comment 14 Ernie Petrides 2004-11-06 01:40:09 UTC
The fix is in the latest U4 kernel, which is in beta test right now (and is available in the RHN beta channel). However, there will be another respin next week, so the -23.EL kernel is not exactly what will be released in Update 4. I would advise waiting until the final U4 is released (beginning of December).
Comment 15 Joe Goyette 2004-11-10 16:04:27 UTC
Unfortunately I can't wait a month for this... when I look at the RHN AS3 Update 4 beta channel these are the only kernel pachages I see. glibc-kernheaders-2.4-9.1.87.i386.rpm kernel-2.6.9-1.648_EL.i586.rpm kernel-2.6.9-1.648_EL.src.rpm kernel-smp-2.6.9-1.648_EL.i586.rpm kernel-utils-2.4-13.1.37.i386.rpm Are those the correct kernel packages ? Thanks again, Joe
Comment 16 Ernie Petrides 2004-11-10 21:23:12 UTC
Joe, you're looking in the wrong channel. The kernel version is 2.4.21-23.EL, and 2.4.21-24.EL will be built tonight (but won't be available in RHN for about a week).
Comment 17 Joe Goyette 2004-11-16 21:35:49 UTC
Ah ! I found the right location this time. Thanks for your help with this. Joe
Comment 18 Alexander Suvorov 2004-12-09 21:14:03 UTC
Ernie, For our internal policy we must build custom kernel. We have the same problem that has been described. Please, could you give us direct link to the patch to fix this problem?
Comment 19 Alexander Suvorov 2004-12-09 21:25:37 UTC
Another variant: please, give tip how can I find src.rpm of the kernel with fix of the kswapd problem in RHN? I looked at it but did not see any way to find src.rpm :(
Comment 20 Ernie Petrides 2004-12-09 22:02:48 UTC
The relevant RPM is kernel-source-2.4.21-27.EL.i386.rpm, which can be found in the i386 subdirectory of the following URL: ftp://partners.redhat.com/a61d109e2483b0bf579b0b5f90a5ea8c/2.4.21-27.EL/ The kernel (along with the rest of U4) is scheduled for release on 20-Dec-2004, at which time you will be able to find it in the main RHN channel(s).
Comment 21 John Flanagan 2004-12-20 20:56:40 UTC
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html