132155 – Excessive swapping related to disk cache usage

Bug 132155 - Excessive swapping related to disk cache usage

Summary: Excessive swapping related to disk cache usage

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-09-09 11:50 UTC by Yngve Svendsen
Modified:	2007-11-30 22:07 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-12-20 20:56:33 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Alt-SysRq-M outputs (77.43 KB, text/plain) 2004-09-10 08:25 UTC, Yngve Svendsen	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2004:550	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4	2004-12-20 05:00:00 UTC

Description Yngve Svendsen 2004-09-09 11:50:13 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7)
Gecko/20040803 Firefox/0.9.3

Description of problem:
We've seen a dramatic drop in throughput for our database server when
going from RHAS 2.1 to RHAS 3.0 on the same hardware.  After some
investigation, we found that the OS was swapping heavily during the
test run, even though there should be more than enough physical
memory.  It did not when the same servers were running RHAS 2.1.

REPRODUCTION:

We have done some experiments using a simple test program (see below).
It allocates a large buffer, fills it, and then accesses it randomly
at full speed.  On a machine with 1Gb memory, this ought to work fine
with a 750M buffer.

But if we first copy some large files between local file systems in
order to use up memory for disk cache, and THEN start the test program
(after waiting a minute for flushing), we see problems.  The program
quickly steals from the cache until (according to vmstat) it's down to
about 250M. After that the cache is freed up only very slowly, and the
test program starts swapping heavily.  It takes several minutes to
free up 35M more.

Also, if we do the file copying while the test is running, it actually
steals memory for cache from the active process, which again kills
performance.

In our database testing, it looks like some amount of cache is
permanently reserved, with the result that the server *never* gets all
it needs and keeps swapping.  We had to reduce the memory usage by at
least 200M to avoid swapping.

The problem is seen on RHAS 3.0 Update 2, as well as on Update 3
(beta) (kernels 2.4.21-15.ELsmp and 2.4.21-17.ELsmp resp.). In the
latter case, on one occasion vmstat says the OS keeps swapping *out*
but not in, on the order of several 100K per second, for 10 minutes
after the cache has shrunk to its "final" size.

The same tests run on RHAS 2.1 did not show this problem; the test
program gets all the memory it needs and there is very little
swapping.  All tests were run on machines with 1G memory, and using
750M for the test program.  C source code follows:

---
/* Trivial memory user. Argument: number of Mb to allocate */

#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>

int main (int argc, char **argv)
{
    int max = atoi (argv[1]) * 1048576 / sizeof(int);
    int *num = malloc (max * sizeof(int));
    int i;

    for (i = 0;  i < max;  i++) {
        num[i] = i;
    }
    srand ((int)getpid());
    
    for (;;) {
        i = (int) rand() % max;
        num[i] = rand();
    }

    return 0;
}
-

Version-Release number of selected component (if applicable):
kernel 2.4.21-17.ELsmp and 2.4.21-15.ELsmp

How reproducible:
Always

Steps to Reproduce:
See Description

Actual Results:  See Description

Expected Results:  See Description

Additional info:

Comment 3 Larry Woodman 2004-09-09 13:35:12 UTC

Yngve, can you get several "Alt Sysrq M" outouts when you see the system
in thae state that you are describing?

Larry Woodman

Comment 4 Yngve Svendsen 2004-09-10 08:25:07 UTC

Created attachment 103678 [details]
Alt-SysRq-M outputs

This is output from Alt-SysRq-M taken repeatedly starting just before the test
program was started, while the test program ran and until the system started to
calm down, i.e. when the disk cache size was starting to destabilize. We
observed frantic swapping, as reported by vmstat.

The test program was run with argument "750".

Comment 5 Yngve Svendsen 2004-09-10 09:02:12 UTC

Sorry, I meant "stabilize", not "destabilize" in the previous comment.
Freudian slip, I guess :-)

Comment 6 Larry Woodman 2004-09-20 18:09:56 UTC

I have been working on a patch that helps the system reclaim pagecache
memory more effectively when the pagecache is over pagecache.maxpercent.
What this patch does is reactivate anonymous inactive dirty pages of
memory when the active pagecache pages exceed pagecache.maxpercnet. 
This will further prevent the system from swapping when the majority
of memory is in the pagecache.

************************************************************************
@@ -292,7 +310,14 @@ int launder_page(zone_t * zone, int gfp_
                                                                     
                                            
        BUG_ON(!PageInactiveDirty(page));
        del_page_from_inactive_dirty_list(page);
-       add_page_to_inactive_laundry_list(page);
+
+       /* if pagecache is over max dont reclaim anonymous pages */
+       if (cache_ratio(zone) > cache_limits.max && page_anon(page) &&
free_min(zone) < 0) {
+               add_page_to_active_list(page, INITIAL_AGE);
+               return 0;
+       } else {
+               add_page_to_inactive_laundry_list(page);
+       }
        /* store the time we start IO */
        page->age = (jiffies/HZ)&255;
        /*
********************************************************************

Please try out the appropriate kernel and let me know how it works ASAP:

>>>http://people.redhat.com/~lwoodman/.RHEL3pagecachefix/

Thanks, Larry Woodman

Comment 7 Yngve Svendsen 2004-09-21 11:05:14 UTC

We did some tests with this patched kernel and the trivial test
program in the original report, and the behaviour has unfortunately
not improved. It may actually be that it is slightly worse.

BTW: Will this get copied to the issue-tracker case that was opened
for this report?

Comment 8 Larry Woodman 2004-09-21 18:07:31 UTC

Yngve, can you rerun the test after a reboot and 
"echo 1 10 15 > /proc/sys/vm/pagecache"

Larry

Comment 10 Ernie Petrides 2004-09-24 09:48:09 UTC

A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.11.EL).

Comment 11 Larry Woodman 2004-10-21 14:54:30 UTC

Yngve, cajn you grab and test the lates kernel for me?  I have made
several changes to minimize swapping when file cacheing is involved. 
If you still have problems with this kernel please get several
AltSysrq-M outputs when the system is swapping heavily.  The latest
i686 smp kernel is here:

>>>http://people.redhat.com/~lwoodman/.for_sun/


Thanks for your help, Larry Woodman

Comment 12 Yngve Svendsen 2004-10-22 13:10:19 UTC

Very interesting. Running this kernel, we are almost back to the
performance of RHAS 2.1. These are TPC-B figures from our database
server product running on the new kernel:

Kernel version   1 client   4 clients 16 clients
2.4.9-e.34:      124 tps    231 tps    245 tps
2.4.21-20.EL:     65 tps     94 tps    102 tps
2.4.21-22.ELsmp: 107 tps    215 tps    224 tps

This means we are still 10% away from the performance of 2.1, but this
is beginning to look acceptable.

Two questions:

1: There is a mystery here. We still see about the same amount of
swapping, but it seems that the stuff we need isn't getting swapped
out as often as before. Do you have any comments on that?

2: What is the status of these optimizations. Will they make it into
an official release or are they more of the experimental sort?

Comment 14 Yngve Svendsen 2004-10-25 18:03:25 UTC

We have conducted somewhat more thorough testing, and we see the same
pattern as above. It might be possible that a part of the 10%
degradation that is left compared with 2.1 stems from us running with
an SMP kernel on a uniprocessor machine. So, in addition to the two
questions above, I'd like to ask you for a non-SMP version of 2.4.21-22.EL

Comment 15 Larry Woodman 2004-10-25 18:27:34 UTC

 
Yngve

1.) can you explain more about the "swapping mystery" you are seeing?
  I am not following what you are trying to say.   

2.) This kernel is the actual RHEL3-U4 beta kernel, everything in this
kernel will stay.

3.) >>>http://people.redhat.com/~lwoodman/.for_sun/ now contains a UP
kernel.

Comment 18 Yngve Svendsen 2004-10-26 13:48:15 UTC

1: We were still seeing fairly intensive swapping with this kernel
when running the usemem program quoted in the original bug report, as
compared to the behaviour on RHAS 2.1 U3. However, swapping "died
down" far more quickly than it did on previous RHAS 3.0 kernels. More
importantly, we are not seeing excessive swapping during actual test
runs of our database server, so this is probably not a concern for us.

2: Excellent. What is the projected release date for U4? We are
probably unable to have RHAS 3.0 as a supported platform until U4
arrives, so this date is of importance to us.

3: Thanks.

We'll be some more testing runs to confirm the data we have, and we
also need to go hunting for the remaining 10% performance drop versus
2.1 (perhaps it is because we build on 2.1 and run on 2.1 AND 3.0?),
but I think we are fairly close to calling this issue "resolved".

Comment 19 Ernie Petrides 2004-11-03 21:20:07 UTC

*** Bug 137984 has been marked as a duplicate of this bug. ***

Comment 20 Ernie Petrides 2004-11-18 01:13:56 UTC

RHEL3 U4 is expected to be released in 2-3 weeks.  In the meantime,
the RHN beta channel contains the 2.4.21-25.EL kernel, which includes
all known VM-related fixes in U4.

Comment 22 John Flanagan 2004-12-20 20:56:34 UTC

An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html

Comment 23 William 2004-12-23 03:36:11 UTC

Swapping is still present to a much less degree.  I ahve a 10 slot
counter strike server and a 20 slot teamspeak server going on a p-4
1.4 ghz w. 384 megs ram.  here is my free:

 total       used       free     shared    buffers     cached
Mem:        382472     374256       8216          0      47784     237396
-/+ buffers/cache:      89076     293396
Swap:       522072        680     521392

I ahve not tried the echo commands yet.  Will try them and report back
if it is fixed.

Comment 24 William 2004-12-23 04:17:32 UTC

i rebooted and tried "echo 1 10 15 > /proc/sys/vm/pagecache" minor
swapping is still occurring:

 total       used       free     shared    buffers     cached
Mem:        382472     376464       6008          0      49464     229372
-/+ buffers/cache:      97628     284844
Swap:       522072        732     521340

Note You need to log in before you can comment on or make changes to this bug.