144307 – Kernel swapping instead of releasing cache

Bug 144307 - Kernel swapping instead of releasing cache

Summary: Kernel swapping instead of releasing cache

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:
Docs Contact:
URL:	https://bugzilla.redhat.com/bugzilla/...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-01-05 20:17 UTC by William
Modified:	2007-11-30 22:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-10-19 19:09:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
gnubik's slabinfo output (4.97 KB, text/plain) 2005-04-07 00:37 UTC, gnubik	no flags	Details
gnubik's sysrq output (107.33 KB, text/plain) 2005-04-07 00:51 UTC, gnubik	no flags	Details
gnubik's 2nd slabinfo (5.00 KB, text/plain) 2005-04-07 19:17 UTC, gnubik	no flags	Details
View All

Description William 2005-01-05 20:17:19 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.5)
Gecko/20041217

Description of problem:
The system is not releasing cache memory completly.  This had been
addressed by an errata but the behavior is still showing itself.  The
behavior has slowed down.  Before the latest errata swap usage would
be substantially higher.



Version-Release number of selected component (if applicable):
kernel-2.4.21-27.0.1.EL

How reproducible:
Always

Steps to Reproduce:
Just run the system

Additional info:

15:13:39  up 2 days,  5:15,  1 user,  load average: 0.01, 0.02, 0.00
46 processes: 43 sleeping, 3 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    2.0%    0.0%    0.0%   0.0%     0.0%    0.0%   98.0%
Mem:   382472k av,  378084k used,    4388k free,       0k shrd,  
11928k buff
                    295576k actv,   55460k in_d,    8288k in_c
Swap:  514072k av,   16156k used,  497916k free                 
234384k cached


 total       used       free     shared    buffers     cached
Mem:        382472     377748       4724          0      12012     233980
-/+ buffers/cache:     131756     250716
Swap:       514072      16156     497916

Comment 1 Rik van Riel 2005-01-05 21:14:42 UTC

William,

could you please let us know if/when the behaviour in kernel-2.4.21-27.0.1.EL
leads to a performance degradation for your workload.  If there is no
performance degradation, the 16MB in swap are essentially cosmetic and I
wouldn't worry about them.

OTOH, if you do run into a performance degradation, please let us know.

Comment 2 Larry Woodman 2005-01-05 22:24:45 UTC

William, the system does not release all of the cache memory completely.  By
design the system will reclaim very old anonymous pages before newer pagecache
pages and these pages get swapped out.

Larry Woodman

Comment 3 William 2005-01-13 17:29:12 UTC

After a week the system was up to 50 megs of swap usage.

Comment 4 Larry Woodman 2005-01-21 15:08:11 UTC

Its not unusual or wrong for a system with 384MB of memory to need to
swap out ~50MB durring the course of a week.  Is this system exibiting
a problem or do you want it to simply not swap at all?

Larry Woodman

Comment 5 William 2005-01-21 16:30:25 UTC

This seems to be normal behavior for RHEL 3.  That's apparent. 
Frankly when i have over 150 megs in the file cahce i see no reason
for swapping at all.  I just run with swapoff -a on my servers..no
problems with the systems.

Comment 6 Larry Woodman 2005-01-21 16:52:00 UTC

If you lower /proc/sys/vm/pagecache.maxpercnet(third parameter) the
system will attempt to reclaim pagecache pages even more aggressively
before swapping.  The default is 30 which means that the system will
trim the pagecache so only 30% of physical memory is allowed in the
pagecache before the system starts swapping.  If you lower it to 20,
the system will remove even more memory from the pagecache before
swapping.

If it gets set too low the page reclaim code will eat up lots of
cpu-time reclaiming pagecache pages.  Try out some lower values and
see it this helps your system behave more like the way you want it to.

Larry Woodman

Comment 7 William 2005-01-21 17:01:34 UTC

Nod i noticed that..which is why i reset it to defaults and just run
with swapoff -a..<G>

Comment 8 Larry Woodman 2005-01-26 14:32:51 UTC

William, can you get me several AltSysrq-M outputs when the system is
swapping and you doent think it should be?  This will tel me where the
memory is and if then kernel really should be swapping as its designed
or whether you are running into a bug.

Thanks, Larry Woodman

Comment 9 William 2005-01-26 14:37:18 UTC

I am sure this kernel is doing what it should be doing.  I jsut do not
think it should swap at all when there is over 50% committed to file
cache...:)  I am waiting for RHEL4 to come out.  I am running fedora
on another mahcine and it acts like I feel it should..:)

Comment 10 Larry Woodman 2005-01-26 18:12:16 UTC

William, what applications are you running?  The system should now
swap if there is more than 30% of memory in the pagecache.

Larry Woodman

Comment 11 William 2005-01-26 22:17:08 UTC

hlds(counter-strike 1.6) and teamspeak.  I have webmin and SSH going
to administration.  I run ntp for time synchronization against my
internal home time server..:)

Comment 12 William 2005-01-30 23:25:06 UTC

 18:26:45  up 2 days,  8:57,  1 user,  load average: 0.15, 0.10, 0.03
46 processes: 44 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    7.2%    0.0%    0.1%   0.2%     0.1%    0.0%   92.2%
Mem:   382464k av,  377900k used,    4564k free,       0k shrd,  
53936k buff
                    284324k actv,   54520k in_d,    7720k in_c
Swap:  522072k av,   16656k used,  505416k free                 
189480k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 2926 hlds      15   0  115M 109M  6244 S     7.3 29.2   9:43   0
hlds_i686
    1 root      15   0   508  508   452 S     0.0  0.1   0:03   0 init
    2 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 keventd
    3 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kapmd
    4 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0
ksoftirqd/0
    7 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 bdflush
    5 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kswapd
    6 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kscand
    8 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kupdated
    9 root      25   0     0    0     0 SW    0.0  0.0   0:00   0
mdrecoveryd
   13 root      15   0     0    0     0 SW    0.0  0.0   0:00   0
kjournald
   68 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 khubd
  892 root      15   0     0    0     0 SW    0.0  0.0   0:00   0
kjournald
  928 root      15   0     0    0     0 SW    0.0  0.0   0:00   0
kjournald
 1383 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 eth0
 1427 root      15   0   612  612   528 S     0.0  0.1   0:00   0 syslogd
 1431 root      22   0   468  468   408 S     0.0  0.1   0:00   0 klogd
 1457 rpc       15   0   564  564   492 S     0.0  0.1   0:00   0 portmap
 1565 root      15   0  1572 1572  1324 S     0.0  0.4   0:00   0 sshd
 1598 root      15   0   640  640   560 S     0.0  0.1   0:00   0 crond
 1607 daemon    15   0   584  584   508 S     0.0  0.1   0:00   0 atd
 1617 root      23   0   432  432   376 S     0.0  0.1   0:00   0 mingetty
 1618 root      23   0   432  432   376 S     0.0  0.1   0:00   0 mingetty
 1619 root      23   0   432  432   376 S     0.0  0.1   0:00   0 mingetty
 1620 root      23   0   432  432   376 S     0.0  0.1   0:00   0 mingetty
 1621 root      23   0   432  432   376 S     0.0  0.1   0:00   0 mingetty
 2126 root      15   0  5052 5052  1972 S     0.0  1.3   0:00   0
miniserv.pl
 2735 tss       34  19  2280 2276  1636 S N   0.0  0.5   0:01   0
server_linux
 2736 tss       15   0  2280 2276  1636 S     0.0  0.5   0:00   0
server_linux
 2737 tss       15   0  2280 2276  1636 S     0.0  0.5   0:08   0
server_linux
 2738 tss       15   0  2280 2276  1636 S     0.0  0.5   0:17   0
server_linux
 2739 tss       15   0  2280 2276  1636 S     0.0  0.5   0:29   0
server_linux
 2740 tss       15   0  2280 2276  1636 S     0.0  0.5   0:00   0
server_linux
 2741 tss       15   0  2280 2276  1636 S     0.0  0.5   0:00   0
server_linux
 2742 tss       15   0  2280 2276  1636 S     0.0  0.5   0:00   0
server_linux
 2743 tss       15   0  2280 2276  1636 S     0.0  0.5   0:00   0
server_linux


how do i get that alt output you were asking about?  I am still rather
new to advanced linux commands.  frankly with 190 megs in file cache i
do not think it should be swapping at all.

Comment 13 Larry Woodman 2005-02-07 14:52:48 UTC

William, first of all I dont consider this system to be running
abnormal at all.  There is 512MB of memory and it has swapped out
~16MB.  At one time durring its life the demmand for anonymous memory
was high enough so that the pagecache was driven below 30% and it
neede to swap out 3% of the total memory.  Is there a performance
problem or are you just concerned about where all your memory is going.

As far as getting AltSysrq data is concerned:
1.) login as root.
2.) echo 1 > /proc/sys/kernel/sysrq
3.) hold the Alt and Sysrq keys down and press m, t, p, w, etc
or
3.) echo m , t, p, w > /proc/sysrq-trigger
4.) the diagnostics will print on the console and /var/log/messages.

Finally, the kernel is designed to trim the pagecache down to
/proc/sys/vm/pagecache.maxpercent(third value) before it swapps at
all.  The default is 30 so if you set it lower(15) it will reclaim
even more pagecache pages before swapping.  This can be accomplished
via "echo 1 15 15 > /proc/sys/vm/pagecache".


Larry Woodman

Comment 14 William 2005-02-07 15:17:07 UTC

I figured that is how the RHEL 3 kernel was supposed to act.  That
being said i do not agree with it..:)  I lowered my third value to 5.
 Thanks for taking time to talk about this issue.

Comment 15 Larry Woodman 2005-02-07 15:20:58 UTC

Be careful lowering the pagecache.maxpercnet to this low, when its set
too low the system will spend lots of time recycling pagecache pages
around.  Let me know how it goes.

Thanks, Larry Woodman

Comment 16 William 2005-02-07 15:29:50 UTC

I used to run with swapoff but then my hlds application starts
crashing.  So i am jsut looking to reduce the pagecache..<G>

Comment 17 Larry Woodman 2005-03-01 17:32:25 UTC

William, whats the status of this bug?  Are you satisfied with RHEL3
the way is is when you lower /proc/sys/vm/pagecache.maxpercnet?

Larry Woodman

Comment 18 William 2005-03-04 13:12:47 UTC

Not really..but it is as you said..the cpu gets chewed up recovering pages if i
chop it too much.  I have converted the machine in question to RHEL 4.  I have
one other machine using 3.

Comment 19 gnubik 2005-04-07 00:37:03 UTC

Created attachment 112795 [details]
gnubik's slabinfo output

Comment 20 gnubik 2005-04-07 00:39:29 UTC

We have been struggling with this bug since early versions of RHEL 3 kernels.  
We are currently running update 4 (2.4.21-27.ELsmp) with vm.pagecache set to 1 
15 15.  This seemed to help a little a while ago, but we're still having a 
serious problem with this

Applications running:  Apache 2.0.46, Sun JDK 1.4.1_03, nfs client
Hardware:  Dual Xeon, 4GB ram

The cached data seems to be related to nfs client.  If the volume is unmounted, 
much of the memory is freed.

Here is meminfo:

        total:    used:    free:  shared: buffers:  cached:
Mem:  4127768576 4101242880 26525696        0 139051008 1587433472
Swap: 4294959104 401453056 3893506048
MemTotal:      4031024 kB
MemFree:         25904 kB
MemShared:           0 kB
Buffers:        135792 kB
Cached:        1171092 kB
SwapCached:     379136 kB
Active:        2895244 kB
ActiveAnon:    2252316 kB
ActiveCache:    642928 kB
Inact_dirty:    627504 kB
Inact_laundry:   95008 kB
Inact_clean:     93428 kB
Inact_target:   742236 kB
HighTotal:     3211208 kB
HighFree:         8164 kB
LowTotal:       819816 kB
LowFree:         17740 kB
SwapTotal:     4194296 kB
SwapFree:      3802252 kB
Committed_AS: 20046828 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB

Attaching slabinfo and AltSysrq-M output

Comment 21 gnubik 2005-04-07 00:51:53 UTC

Created attachment 112796 [details]
gnubik's sysrq output

Comment 22 Larry Woodman 2005-04-07 13:48:02 UTC

gnubik, I not sure exactly what the problem is here but your system has used
most of its memory for active anonymous regions of user address space. Thats
what the "aa" stands for in the AltSysrq-M output.  When a system has used most
of its RAM for anonymous user memory regions it must swap when it reclaims
memory because its not caching much file system data.  In this case is has only
used 200MB of swap space while there is over 2.2GB of active anonymous memory
and probably another GB or more of inactive anonymous memory.  

>>>aa:0 ac:0 id:0 il:0 ic:0 fr:2898
>>>aa:24051 ac:59548 id:35867 il:5583 ic:5244 fr:1316
>>>aa:528017 ac:110703 id:122584 il:18351 ic:18437 fr:343


Its likely that the real cause of this isnt the kernel and the way it reclaims
memory but its the applications that are running and the amount of anonymous
memory they allocate and reference.  Can you get me either a "top" of "ps aux"
output when the system is in this state so I can see what is running and I can
see the VZS ad RSS of those applications?  In this case the pagecache is only
about 25% of ram and anonymous memory accounts for close to 75%, which is why
the system has to swap.  Do these numbers match what you would expect to see on
your system?

Thanks, Larry Woodman

Comment 23 gnubik 2005-04-07 16:21:39 UTC

We are in fact running apps with this much memory usage.  The JVM process is 
configured with a 2GB heap.  Apache is also using a significant amount of 
memory.  But, the behavior still seems very strange.  First of all swap will 
continue to grow, and has been observed at 800+ MB in use, with no known change 
to the app memory allocation.  Second, when we unmount and remount the single 
NFS filesystem on these systems, "cached" memory dramatically drops and this 
frees up a huge amount of physical ram and sometimes swap.  This makes our 
problems go away for a while until creep back  (I'll need to wait for a 
maintenance window to get you a good before-n-after of this one)

For now, here is ps aux output from an example system

USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  1516  512 ?        S    Apr02   0:08 init [3] 
root         2  0.0  0.0     0    0 ?        SW   Apr02   0:00 [migration/0]
root         3  0.0  0.0     0    0 ?        SW   Apr02   0:00 [migration/1]
root         4  0.0  0.0     0    0 ?        SW   Apr02   0:00 [migration/2]
root         5  0.0  0.0     0    0 ?        SW   Apr02   0:00 [migration/3]
root         6  0.0  0.0     0    0 ?        SW   Apr02   0:00 [keventd]
root         7  0.0  0.0     0    0 ?        SWN  Apr02   0:00 [ksoftirqd/0]
root         8  0.0  0.0     0    0 ?        SWN  Apr02   0:00 [ksoftirqd/1]
root         9  0.0  0.0     0    0 ?        SWN  Apr02   0:00 [ksoftirqd/2]
root        10  0.0  0.0     0    0 ?        SWN  Apr02   0:00 [ksoftirqd/3]
root        13  0.0  0.0     0    0 ?        SW   Apr02   0:01 [bdflush]
root        11  0.0  0.0     0    0 ?        SW   Apr02   0:41 [kswapd]
root        12  0.0  0.0     0    0 ?        SW   Apr02   6:50 [kscand]
root        14  0.0  0.0     0    0 ?        SW   Apr02   0:07 [kupdated]
root        15  0.0  0.0     0    0 ?        SW   Apr02   0:00 [mdrecoveryd]
root        26  0.0  0.0     0    0 ?        SW   Apr02   0:04 [kjournald]
root        83  0.0  0.0     0    0 ?        SW   Apr02   0:00 [khubd]
root       330  0.0  0.0     0    0 ?        SW   Apr02   0:00 [kjournald]
root       331  0.0  0.0     0    0 ?        SW   Apr02   0:23 [kjournald]
root       332  0.0  0.0     0    0 ?        SW   Apr02   0:00 [kjournald]
root       333  0.0  0.0     0    0 ?        SW   Apr02   0:00 [kjournald]
root       334  0.0  0.0     0    0 ?        SW   Apr02   0:00 [kjournald]
root       335  0.0  0.0     0    0 ?        SW   Apr02   0:10 [kjournald]
root       336  0.0  0.0     0    0 ?        SW   Apr02   0:11 [kjournald]
root      1109  0.0  0.0  1592  576 ?        S    Apr02   0:00 syslogd -m 0
root      1113  0.0  0.0  1528  468 ?        S    Apr02   0:00 klogd -x
root      1123  0.0  0.0  1512  452 ?        S    Apr02   0:24 irqbalance
rpc       1140  0.0  0.0  1656  600 ?        S    Apr02   0:00 portmap
rpcuser   1159  0.0  0.0  1688  716 ?        S    Apr02   0:00 rpc.statd
root      1170  0.0  0.0  1572  404 ?        S    Apr02   0:01 mdadm --monitor -
root      1186  0.0  0.0 83568  600 ?        S    Apr02   0:00 /sbin/auditd
root      1214  0.0  0.0     0    0 ?        SW   Apr02   0:26 [rpciod]
root      1215  0.0  0.0     0    0 ?        SW   Apr02   0:00 [lockd]
root      1280  0.0  0.0  3656 1572 ?        S    Apr02   0:05 /usr/sbin/sshd
ntp       1297  0.0  0.0  2572 2568 ?        SL   Apr02   0:16 ntpd -U ntp -p /v
root      1317  0.0  0.0  6212 2560 ?        S    Apr02   0:09 sendmail: accepti
smmsp     1326  0.0  0.0  5996 2272 ?        S    Apr02   0:00 sendmail: Queue r
root      1336  0.0  0.0  1564  468 ?        S    Apr02   0:00 gpm -t imps2 -m /
root      1345  0.0  0.0  1600  640 ?        S    Apr02   0:00 crond
xfs       1368  0.0  0.0  5352 2928 ?        S    Apr02   0:00 xfs -droppriv -da
daemon    1377  0.0  0.0  1592  576 ?        S    Apr02   0:00 /usr/sbin/atd
root      1386  0.0  0.0  1500  424 tty1     S    Apr02   0:00 /sbin/mingetty tt
root      1387  0.0  0.0  1500  424 tty2     S    Apr02   0:00 /sbin/mingetty tt
root      1388  0.0  0.0  1500  424 tty3     S    Apr02   0:00 /sbin/mingetty tt
root      1389  0.0  0.0  1500  424 tty4     S    Apr02   0:00 /sbin/mingetty tt
root      1390  0.0  0.0  1500  424 tty5     S    Apr02   0:00 /sbin/mingetty tt
root      1391  0.0  0.0  1500  424 tty6     S    Apr02   0:00 /sbin/mingetty tt
root     20033  0.0  0.0  7592 1916 ?        S    Apr03   0:12 cupsd
root     18773  0.0  0.0  7008 2180 ?        S    Apr05   0:00 sshd: root@pts/2
root     18782  0.0  0.0  4252 1348 pts/2    S    Apr05   0:00 -bash
root     10492  0.0  0.0  7020 3300 pts/2    S    Apr05   0:00 perl /usr/local/r
root     10495  0.0  0.0  4200 1000 pts/2    S    Apr05   0:00 sh -c /usr/local/
root     10496 37.7 54.8 2359924 2212428 pts/2 S  Apr05 761:13 /usr/local/jdk/bi
root     10517  0.0  0.1  7560 4164 ?        S    Apr05   0:37 /usr/local/apache
nobody   10518  0.0  0.0  6732 3320 ?        S    Apr05   0:00 /usr/local/apache
nobody   22951  0.1  0.2 691004 11908 ?      S    07:58   0:07 /usr/local/apache
nobody   23023  0.1  0.2 689980 11760 ?      S    08:01   0:07 /usr/local/apache
nobody   23092  0.1  0.2 689980 11704 ?      S    08:05   0:06 /usr/local/apache
nobody   23163  0.1  0.2 689980 11560 ?      S    08:14   0:05 /usr/local/apache
root     23233  0.0  0.0  6856 2064 ?        S    08:19   0:00 sshd: sitescop [p
root     23237  0.0  0.0  6856 2064 ?        S    08:19   0:00 sshd: sitescop [p
sitescop 23239  0.0  0.0  6868 2268 ?        S    08:19   0:00 sshd: sitescop@pt
sitescop 23240  0.0  0.0  4380 1460 pts/0    S    08:19   0:00 -bash
sitescop 23279  0.0  0.0  7004 2280 ?        S    08:20   0:00 sshd: sitescop@pt
sitescop 23280  0.0  0.0  4384 1464 pts/1    S    08:20   0:00 -bash
nobody   23319  0.1  0.2 689980 11344 ?      S    08:22   0:05 /usr/local/apache
nobody   23388  0.1  0.2 691004 11756 ?      S    08:24   0:05 /usr/local/apache
nobody   23458  0.1  0.2 689980 11064 ?      S    08:33   0:03 /usr/local/apache
nobody   23527  0.1  0.2 688956 11040 ?      S    08:34   0:03 /usr/local/apache
nobody   23594  0.1  0.2 691004 11436 ?      S    08:34   0:04 /usr/local/apache
nobody   23662  0.1  0.2 689980 11304 ?      S    08:37   0:03 /usr/local/apache
nobody   23729  0.1  0.2 689980 11384 ?      S    08:39   0:03 /usr/local/apache
nobody   23797  0.1  0.2 689980 11268 ?      S    08:40   0:03 /usr/local/apache
nobody   23864  0.1  0.2 689980 11192 ?      S    08:41   0:02 /usr/local/apache
nobody   23931  0.1  0.2 689980 10980 ?      S    08:43   0:03 /usr/local/apache
nobody   24000  0.1  0.2 689980 10992 ?      S    08:44   0:02 /usr/local/apache
nobody   24067  0.1  0.2 691004 11308 ?      S    08:45   0:02 /usr/local/apache
nobody   24134  0.1  0.2 689980 10944 ?      S    08:46   0:03 /usr/local/apache
nobody   24201  0.1  0.2 688956 11020 ?      S    08:47   0:02 /usr/local/apache
nobody   24268  0.1  0.2 689980 10928 ?      S    08:47   0:02 /usr/local/apache
nobody   24335  0.1  0.2 688956 10920 ?      S    08:48   0:02 /usr/local/apache
nobody   24404  0.1  0.2 689980 10840 ?      S    08:52   0:02 /usr/local/apache
nobody   24473  0.1  0.2 689980 10816 ?      S    08:54   0:01 /usr/local/apache
nobody   24545  0.1  0.2 688956 10756 ?      S    09:01   0:01 /usr/local/apache
nobody   24612  0.1  0.2 689980 10504 ?      S    09:02   0:00 /usr/local/apache
nobody   24688  0.1  0.2 689980 10544 ?      S    09:05   0:01 /usr/local/apache
nobody   24755  0.1  0.2 689980 10388 ?      S    09:06   0:00 /usr/local/apache
nobody   24823  0.2  0.2 689980 10416 ?      S    09:09   0:00 /usr/local/apache
root     24892  0.0  0.0  6868 2164 ?        S    09:14   0:00 sshd: root@pts/3
root     24894  0.0  0.0  4264 1344 pts/3    S    09:14   0:00 -bash
root     24941  0.0  0.0  2868  808 pts/3    R    09:14   0:00 ps aux

Comment 24 Larry Woodman 2005-04-07 17:48:28 UTC

OK, first of all /proc/sys/vm/pagecache.maxpercent(3rd value) is set to 30 by
default in the RHEL3-U4 kernel(2.4.21-27.EL).  This means that the system will
attempt to reclaim pagecache pages rather than anonymous pages/swap if more than
30% of ram is in the pagecache.  Since /proc/meminfo shows "Cached: 1171092 kB",
less than 30% of ram is in the pagecache therefore the system will reclaim
anonymous memory and swap as well as reclaiming pagecache pages.  So, if you
lower pagecache.maxpercent to say 20, the system should revert back to
reclaiming pagecache pages only but at the cost of CPU time.  

As far as unmounting file systems is concerned, there are 3 reasons that the
system will eliminate pagecache pages: 1.) memory exhaustion and reclamation,
2.) deleting a file that has pagecache pages and 3.) unmounting the file system
results in invalidating every pagecache paged that was used to cache that file
system.  So, it certainly will free up a bunch of memory but never swap space. 
However, when you do this the system must go back and re-read every file page
into the pagecache since you just invalidated it.

For starters, can you lower /proc/sys/vm/pagecache to 1 15 20 and let me know
how this works on your system?  Is another possibility re-configuring the JVM
process with a smaller heap if does it need to be that big just to run?

Finally the slabcache accounts for about 1/3 of lowmem, can you get me a
/proc/slabinfo output when the system is in this state so I can see who is
consuming so much memory?  This can also reduce the total amount of memory
available for both the pagecache and the anonymous memory, thereby causing
swapping to occur.

Thanks for your help, Larry Woodman

Comment 25 gnubik 2005-04-07 19:17:37 UTC

Created attachment 112826 [details]
gnubik's 2nd slabinfo

Comment 26 gnubik 2005-04-07 19:18:28 UTC

Comment on attachment 112826 [details]
gnubik's 2nd slabinfo

Thanks for your quick replies...

We've actually got pagecache set to 1 1 15 already, and 1171092 is > than 15%
of ram...  We've been trying to use this setting to resolve this issue since
RHEL3-U1 with no luck.	Fortunately we're not hitting problems with CPU time.

We're actually trying to increase heap for additional app layer caching and
greater user capacity, not the other way around, unfortunately.

As far as the NFS file system goes... if there is any way to control how these
live in the pagecache, it would be awesome.  We've tried noac option but no
luck.  There is no noticeable performance change with a freshly mounted
filesystem and little cached vs. long running mount with lots of cache, so if
we can just keep this data out of the cache and the java heap out of swap, we'd
have a winner

Already attached one slabinfo.	Here is another, although the system hasn't
been running as long in this snapshot... 1.1GB cached, 200M swapped

Comment 27 William 2005-04-07 19:38:48 UTC

here's something.  Hvae you tried running without swap at all?  I do not have
anywhere near the load you do..but just a thought..:)

Also have you looked into RHEL 4?  I have not had any of these swapping issues
with it(especially putting swappiness down to 20).

Comment 28 Larry Woodman 2005-04-07 20:08:59 UTC

Can you try the latest RHEL3-U5 beta kernel in this environment?  I made a
change so that mapped file pages dont get treated like they are anonymous pages.

Larry Woodman

Comment 29 gnubik 2005-04-07 21:35:00 UTC

Well unfortunately we haven't totally reproduced this in a load test 
environment yet.  And we're not comfortable with a beta kernel in production at 
this point...  when is the final U5 kernel scheduled to drop?  any other tuning 
possibilities?

Comment 30 rodrigo 2005-08-02 14:56:14 UTC

Hi , I have an enviroment with Linux r9test 2.4.21-32.0.1.EL #1 SMP (U5).

vm.pagecache=1 3 7

Test case: 
----------
"dd" to the file system to use all the buffer cache.  after that Start Oracle DB
and run some scripts to use the DB. 
We see that the system starts swapping.  
Can we avoid using the buffer cache ?  

We are closely working with Oracle Support in this issue but they are claiming
that is a OS bug. 

Server:  HP RX4640, 32GB RAM

Thanks
Rodrigo

Comment 31 RHEL Program Management 2007-10-19 19:09:50 UTC

This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.