Bug 133183

Summary: cpio with many files flips kswapd, system hangs
Product: Red Hat Enterprise Linux 3 Reporter: Roberto Bourgonjen <otrebor>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: dlewis, joe, petrides, redhat-bugzilla, riel, say, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-20 20:56:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
captured sysrq data during various times
none
Screen Captures of TOP none

Description Roberto Bourgonjen 2004-09-22 09:08:01 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3)
Gecko/20040803

Description of problem:
When reading a tape with 1.261.806 files (through cpio), some large,
mostly small, at a certain point - about halfway i'd say - kswapd
jumps up to 100 percent CPU, system is completely frozen, and kswapd
NEVER recovers. System is even to slow to take keyboard input so hard
reboot is only option.

While reading the first quarter of the tape _all_ the memory (4Gb) is
used (as cache).

I've tried changing vm.pagecache without effect.

RedHat 9.2 (from which I upgraded) and 2.6.8 kernel on the same RHEL
machine do not have this problem. Neither does Fedora (also with 2.6.8
kernel).

The problem always occurs (tried it 5 times).

Version-Release number of selected component (if applicable):
kernel-2.4.21-20.EL

How reproducible:
Always

Steps to Reproduce:
1. Read tape with 1 million files on machine with 4Gb RAM.
    

Additional info:

Comment 1 Larry Woodman 2004-09-23 21:06:53 UTC
Roberto, for starters can you get me several top, AltSysrq-M and
AltSysrq-W outputs when your system is in this state.

Thanks, Larry Woodman


Comment 2 Roberto Bourgonjen 2004-09-24 14:08:30 UTC
Larry, I don't think I manage to do that. The server is back at the
coloc, running Fedora 2 now (already had one server crash but that is
another story). I will see if I have time to run this on my
development server (which has only 3Gig mem AND slightly less fast
harddisks, which might make a difference), but that'll be next week
the earliest.

Seems bug #124058 might be related.

Comment 3 Larry Woodman 2004-10-20 18:56:20 UTC
Roberto,  when you get a chance can you test out the fix for this problem?

Its located in:

>>>>http://people.redhat.com/~lwoodman/.RHEL3/kernel-smp-2.4.21-22.prune_icachefix.EL.i686.rpm


Thanks, Larry



Comment 4 Don Lewis 2004-10-20 19:08:58 UTC
Created attachment 105537 [details]
captured sysrq data during various times

This is the sysrq information as tiome stamped on the system.

Comment 5 Don Lewis 2004-10-20 19:10:57 UTC
Created attachment 105540 [details]
Screen Captures of TOP

This is the screen capture of TOP and some other comments.

Comment 6 Don Lewis 2004-10-20 19:15:54 UTC
I have exactly the same kswapd issue and have documentation attached.

Please look at this as soon as possible as system is not useable.

All I have to do is restore many files from tape using cpio, or copy 
files with scp or rcp accross the network.

Comment 7 Larry Woodman 2004-10-20 19:53:02 UTC
Don, this is with the latest kernel I posted?

Also, please get me a few AltSysrq-T outputs so I can see where kswapd
is hanging out.

Larry


Comment 8 Don Lewis 2004-10-20 20:10:35 UTC
Larry:

I don't know about kernels that you have posted.  I have been around 
Unix a long time but Redhat is somewhat new to me.  I only have what 
up2date would have provided.  Kernel 2.4.21-20.ELsmp.

Can you give me more instructions on the AltSysrq-T?  

I will have to start the test up again as the system has been re-
booted.

Comment 9 Larry Woodman 2004-10-20 20:28:47 UTC
Don, can you grab this kernel and give it a try?

>>>>http://people.redhat.com/~lwoodman/.RHEL3/kernel-smp-2.4.21-22.prune_icachefix.EL.i686.rpm


Larry


Comment 10 Don Lewis 2004-10-22 12:15:30 UTC
Larry:

We have installed your prune_icachefix Kernel and run our cpio test.

The system never became bogged down by the kswapd daemon.

The system load seemd a bit high for the actual work being done, 
around 1.4.

When can we expect a production fix and release of this Kernel.

Thanks for your help on this matter.

Don Lewis


Comment 11 Roberto Bourgonjen 2004-10-27 13:30:55 UTC
I have also tried the prune_icachefix kernel, and a simple perl script
fetching images from a webcam and analyzing it with GD, that runs very
steady on the 2.4.21-20.ELsmp kernel, now eats memory like crazy,
expanding to over 5 Gb within half an hour. So I am afraid there are
some real serious side-effects to your solution.

Comment 12 Ernie Petrides 2004-10-28 23:39:45 UTC
A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-23.EL).


Comment 13 Joe Goyette 2004-11-05 20:32:32 UTC
Hi there. Is the patch for this issue generally available ? I too have
the same problem with kswapd taking down my systems (built several new
servers for a critical project with RHEL AS 3.0 U3). The link to the
patch ealier in this thread is no longer valid and I need to get my
systems operational.

THANKS !!!

Joe

Comment 14 Ernie Petrides 2004-11-06 01:40:09 UTC
The fix is in the latest U4 kernel, which is in beta test right
now (and is available in the RHN beta channel).  However, there
will be another respin next week, so the -23.EL kernel is not
exactly what will be released in Update 4.  I would advise
waiting until the final U4 is released (beginning of December).


Comment 15 Joe Goyette 2004-11-10 16:04:27 UTC
Unfortunately I can't wait a month for this... when I look at the RHN
AS3 Update 4 beta channel these are the only kernel pachages I see.

glibc-kernheaders-2.4-9.1.87.i386.rpm                                
             
kernel-2.6.9-1.648_EL.i586.rpm                                       
             
kernel-2.6.9-1.648_EL.src.rpm                                        
             
kernel-smp-2.6.9-1.648_EL.i586.rpm                                   
             
kernel-utils-2.4-13.1.37.i386.rpm 

Are those the correct kernel packages ?

Thanks again,

Joe

Comment 16 Ernie Petrides 2004-11-10 21:23:12 UTC
Joe, you're looking in the wrong channel.  The kernel version is 2.4.21-23.EL,
and 2.4.21-24.EL will be built tonight (but won't be available in RHN for about
a week).


Comment 17 Joe Goyette 2004-11-16 21:35:49 UTC
Ah ! I found the right location this time. Thanks for your help with this.

Joe

Comment 18 Alexander Suvorov 2004-12-09 21:14:03 UTC
Ernie,

For our internal policy we must build custom kernel.
We have the same problem that has been described.
Please, could you give us direct link to the patch to fix this 
problem?


Comment 19 Alexander Suvorov 2004-12-09 21:25:37 UTC
Another variant: please, give tip how can I find src.rpm of the 
kernel with fix of the kswapd problem in RHN?

I looked at it but did not see any way to find src.rpm :(


Comment 20 Ernie Petrides 2004-12-09 22:02:48 UTC
The relevant RPM is kernel-source-2.4.21-27.EL.i386.rpm, which can be
found in the i386 subdirectory of the following URL:

  ftp://partners.redhat.com/a61d109e2483b0bf579b0b5f90a5ea8c/2.4.21-27.EL/

The kernel (along with the rest of U4) is scheduled for release on
20-Dec-2004, at which time you will be able to find it in the main
RHN channel(s).


Comment 21 John Flanagan 2004-12-20 20:56:40 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html