Bug 89226 - (VM)Kernel prefers swapping instead of releasing cache memory
(VM)Kernel prefers swapping instead of releasing cache memory
Status: CLOSED WONTFIX
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
9
athlon Linux
medium Severity medium
: ---
: ---
Assigned To: Larry Woodman
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-04-21 10:35 EDT by Erik Reuter
Modified: 2007-04-18 12:53 EDT (History)
22 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-30 11:40:49 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
none (361 bytes, text/plain)
2003-11-18 16:21 EST, vincent mulligan
no flags Details
Tarball of bug-reproducing example code (3.11 KB, application/octet-stream)
2003-11-25 19:19 EST, Chris Petersen
no flags Details
Fix braindead swapping (6.23 KB, patch)
2004-08-25 05:23 EDT, Marc-Christian Petersen
no flags Details | Diff
Fix braindead swapping (6.23 KB, text/plain)
2004-08-25 05:24 EDT, Marc-Christian Petersen
no flags Details
02 - vm.vm_cache_scan_ratio (2.45 KB, patch)
2004-08-25 05:28 EDT, Marc-Christian Petersen
no flags Details | Diff
03 - vm.vm_passes (6.03 KB, patch)
2004-08-25 05:30 EDT, Marc-Christian Petersen
no flags Details | Diff
04 - vm.vm_gfp_debug (2.21 KB, patch)
2004-08-25 05:32 EDT, Marc-Christian Petersen
no flags Details | Diff
05 - vm.vm_vfs_scan_ratio (3.54 KB, patch)
2004-08-25 05:34 EDT, Marc-Christian Petersen
no flags Details | Diff
06 - Remove old and obsolete VM documentation (5.13 KB, patch)
2004-08-25 05:36 EDT, Marc-Christian Petersen
no flags Details | Diff
07 - Update VM docu to Documentation/sysctl/vm.txt (26.83 KB, patch)
2004-08-25 05:38 EDT, Marc-Christian Petersen
no flags Details | Diff
08 - just reorder 1 variable in mm/vmscan.c (1.16 KB, patch)
2004-08-25 05:41 EDT, Marc-Christian Petersen
no flags Details | Diff
09 - vm.pagecache - Change '1 15 100' to '1 5 10' (299 bytes, patch)
2004-08-25 05:42 EDT, Marc-Christian Petersen
no flags Details | Diff
10 - O(1) scheduler: Introduce sysctl knobs for max-timeslice, min-timeslice and child-penalty (Part 1) (8.15 KB, patch)
2004-08-25 05:44 EDT, Marc-Christian Petersen
no flags Details | Diff
O(1) scheduler: Introduce 'desktop' boot parameter (lowered max-timeslice) (Part 2) (2.82 KB, patch)
2004-08-25 05:47 EDT, Marc-Christian Petersen
no flags Details | Diff

  None (edit)
Description Erik Reuter 2003-04-21 10:35:36 EDT
When I've been using RedHat9 for a while, all the available memory gets used up
either by applications but also by the cache - which is a good thing, as it
speeds up the system.

The problem when all memory is used, is that Linux seems to prefer swapping over
releasing memory from the cache. The system (X especially) appears sluggish when
this happens.

Here's the output from free from my system as of right now:

             total       used       free     shared    buffers     cached
Mem:       1289520    1276304      13216          0      28984     484872
-/+ buffers/cache:     762448     527072
Swap:      2040212      66512    1973700


It's using the swap, even though there's around 0.5GB memory being used by cache.

According to the RedHat9 manual, the parameter /proc/sys/vm/pagecache should
help me control how much memory is used for caching the filesystem, is this
correct? If so, I cant adjust this as it is not available in the /proc structure.

If the above parameter cannot alleviate the problem, is there some other
solution that can be used? This is pretty frustrating, as my system has 1.2GB of
memory in total.


Version-Release number of selected component (if applicable):
kernel-2.4.20-9

How reproducible:
Always
Comment 1 John Bass 2003-05-19 18:06:20 EDT
This is a fundamental bug in the VM/Cache design, as a single process writing
heavily to a filesystem or disk can purge processes from dram as the cache
agressively uses all of real dram memory. The result is a paging frenzy that can
be percieved as a system lockup with disk busy and with response times of
10-30minutes for enter/new-line to an xterm shell to shell prompt reponse.

Active processes which are purged from memory remain locked on the paging queue
with pages being stolen as rapidly as they are faulted back in. There are no
fairness or priority controls preventing pages from being stollen at a rate
preventing execution. The problem actually gets significantly worse by adding
more dram to the system.

Normal tasks, like initializing a database, restoring compressed backups, and
other write intensive jobs effectively crash the machine, while it's locked into
a paging frenzy that will not end in any reasonable time period without power
cycling the machine.
Comment 2 Erik Reuter 2003-05-20 13:50:22 EDT
Is there some way to make the kernel not use up 98% of available memory for disk
caching?

If more free memory was left available for applications, the problem would be
less significant.

My experience as an end-user running X is that the system gets, as you write,
bogged down over time when running into these race conditions (large resource
consuming applications takes alot longer to start after the system has gobbled
up the available memory for caching).

Is anybody looking into this other than me posting here?
Comment 3 Arjan van de Ven 2003-05-22 05:31:34 EDT
first of all try the erratum kernel; it has some minor vm bugs fixed that could
cause the wrong page to be swapped out.

In addition, massive writes shouldn't evict all memory anymore; the 2.4.20 rmap
VM has code to prevent that.
Comment 4 Erik Reuter 2003-05-22 16:15:48 EDT
Ran up2date today, I'm on Kernel 2.4.20-13.9 now.

It still seem to do alot of swapping, heres some output from free:

             total       used       free     shared    buffers     cached
Mem:       1289496    1265892      23604          0      98460     376552
-/+ buffers/cache:     790880     498616
Swap:      2040212      83644    1956568

After starting and stopping an application (JBoss app server) a couple of times,
free looks like this:

             total       used       free     shared    buffers     cached
Mem:       1289496    1270864      18632          0      99376     377240
-/+ buffers/cache:     794248     495248
Swap:      2040212     115208    1925004

The cached value looks more or less unchanged, swapping has increased around
30MB, and will continue to rise after i start/stop the application some more.

Here's free after starting OpenOffice, The gimp and evolution afterwards:

             total       used       free     shared    buffers     cached
Mem:       1289496    1277144      12352          0     101532     346948
-/+ buffers/cache:     828664     460832
Swap:      2040212     133152    1907060

It's releasing cache now, but swap still rises as cache falls.

It seems to me the kernel is keeping too little memory free for application
startup overhead, so a race condition occurs where the kernel cannot free memory
fast enough from cache to satisfy the need of the applications.
Comment 5 Mike Hearn 2003-05-28 14:14:11 EDT
I'm also seeing this, purely desktop system. 256mb RAM, 2 swap partitions on two
separate disks.

Doing something that involves disk load is enough to kill the systems
responsiveness for some time. Installing an RPM is a nasty one for some reason.
When this happens I can see for instance Nautilus redrawing the screen a line at
a time with constant disk activity throughout. This can happen when simply
switching desktops, but often happens when logging out (but not in).

This has seriously dropped the interactive performance of this system, which is
really annoying :( Here's the output of free about 2 minutes after the last swap
frenzy.

[mike@excalibur Downloads]$ free -m
             total       used       free     shared    buffers     cached
Mem:           249        243          5          0         14         92
-/+ buffers/cache:        136        112
Swap:          847        183        663

I'm using more swap than buffers! Does anybody know when this problem might be
fixed? I don't run any particularly disk heavy programs, just the usual desktop
apps.
Comment 6 acount closed by user 2003-05-28 19:13:58 EDT
rh_9 has several performace problems:

o memory management at 2.4.20-13.x is not good enough
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=90868
o there is a general bug with UTF-8
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=69900
o all X programs use Xft and the RENDER extension is not accelerated
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=89754

for me, avoid to use UTF-8 made the system lighter.
For example: now slocate.cron doesn't disturb my X programs.

LANG=" " is a provisional fix at:

# cat /etc/sysconfig/i18n
LANG=
SUPPORTED="en_US.UTF-8:en_US:en:gl_ES.UTF-8:gl_ES:gl:es_ES.UTF-8:es_ES:es"
SYSFONT="latarcyrheb-sun16"
Comment 7 John Bass 2003-05-31 16:04:21 EDT
I had another one of these failures last night, which is a good example of
why the current VM/Disk cache design is just plain WRONG. Production machine
is a 64MB PII 333MHz machine as a Linux router also providing a RedHat mirror
serviced using TUX for http/ftp service plus rsyncd. Crond runs the normal
scripts for log management and the like, plus mrtg to provide some graphs
for router performance and load. The machine is install as a minimal RH9
install, plus named, mrtg, tux, and rsyncd. There is no X or GUI desktop
system installed.

The filesystem buffers and cache normally take a well over half the real
memory at nearly all times. This triggers paging when the RSS total of the
DNS + rsynd + tux + crond + perl exceeds about 20MB for this system

In nearly all cases, the choice to page out active process's working sets
is the choice to do 2 I/O's including the read for faulting it back in. This
choice should NEVER be made in favor of tightly holding on to disk buffer
cache or filesystem cache memory of questionable value. This choice should
NEVER be made for a low priority task. This choice should NEVER be made AT ALL
until the aggregate RSS approches the real memory size, since the cost to
recover a file to cache is roughly the same or less in real time and disk load.
In single disk systems, the extra cost to seek to the swap file area may be
significantly higher.

With active downloads from the server, disk latency rises significantly,
causing the paging latency to rise to the point that with paging delays the
completion of the mrtg/perl task exceeds 5 minutes. As a result additional
crond tasks, including multiple mrtg's stack up in the run queue, increasing
the agregate RSS causing more paging. This continues for another 20 minutes
till we have a half dozen mrtg tasks running and the machine is devoting 70%
of it's I/O load to paging without managing to complete the first mrtg perl
task which triggered the melt down. Crond was shut down, and the machine ran
another 5 hours without completing any of the mrtg tasks, with response times
to a CR in the ssh session remaining in the several minute range, and time to
complete a "ps -laxf" about 10 minutes. Finally, killing all the MRTG perl tasks
took another 10 minutes before they managed to complete and the system was
responsive again. They NEVER finished .... without intervention, swap would
have been exceeded regardless of how much was allocated, and processes would
start dieing due to allocation failures.

It took quite some time for the filesystem cache to dwindle down to 6MB, even
with this crippling paging I/O load.

In theory, using "FREE" memory to cache the filesystem is a good thing. But
somebody really screwed up here by insisting that somehow caching the filesystem
in the majority of DRAM (files that are very likely to NEVER be used again in
the near term) is much more important than memory for active running processes.

In practice there does not need to be any "free" dram, and what SHOULD happen is
that ANY page fault allocating DRAM should be taken from the disk cache and/or
buffer pool down to a relatively small tunable percentage of real memory.

The disk cache and filesystem cache are there to minimize I/O, not to provoke
I/O in the form of extensive VM paging. By hogging DRAM, the current cache
management takes ANY linux server system unpredictably unstable in production.

At all times, a server resources MUST scale linearly with load, or the drop
in effiency will trigger resource queue stackup with significant hystersis
that is very likely to be non-recoverable as long as requests enter the queues.

To manage this effectively, priority MUST be directly given to active tasks
for all resources such that the tasks can complete without stacking up queues
and increasing the new workload of the system.  Linux violates this
significantly in a number of areas where previous UNIX systems do not. All
kernel resource management algorithms MUST if at ALL possible become more
effiecent under load, and seldom, if ever, trigger more work than would
otherwise be required as compared to a sequential batch execution engine.

To do this properly, write behind disk caching should NEVER be scheduled before
any read request in the queues. There are processes waiting for the reads, and
NONE waiting for the writes (at least until write behinds fill memory, at
which point they get triggered and complete as pairs with reads).

Where at all possible, disk queue scheduling should be priority driven based
on the taks priority the invokes the I/O .... right down to allocation of
pre-I/O resources such as disk buffers and cache space.

The filesystem designs must promote EFFECTIVE aggregation of disk I/O under
load to acctually reduce the per request disk queue latencies and miximize
disk thruput.  Increased disk seeks under load, must be offset by increased
utilization per seek and the corresponsiding rotational loss.

Access to DRAM for RSS MUST be priority driven and fair share distributed.

Processes with inferrior priority MUST NOT be allowed to consume memory and
other resources such that high priority and otherwise interactive tasks
are always on the short end of the stick and unable to effectively use
their high priority status to complete quicker.

There are huge performance costs for flushing DRAM caches .... Linux needs to
work hard at effectively minimizing the cache footprint of the kernel, and
minimizing low priority context switches to tasks that may do little more than
fault out of L1/L2 cache very active higher priority tasks, if not out of
real DRAM too.

Lastly, the design of all cron scheduled tasks should include a serialization
lock to prevent multiples from stacking up in the run queues and memory.

John Bass
Owner/DMS Design
Performance by Design
Comment 8 Rik van Riel 2003-06-01 08:05:21 EDT
The Red Hat Linux 9 kernel should only swap out process pages if the active list
for cache pages is less than 15% of (active cache + active anon).

I would appreciate it if somebody with a misbehaving VM could show me the
contents of /proc/meminfo so I've got a better idea of exactly how things are
going wrong.
Comment 9 John Bass 2003-06-01 19:08:11 EDT
Hi Rik, catching your /proc/meminfo file might be a bit problematic as it can
take 20-30 minutes just to log into a VM trashing machine, and probably the same
period to capture the file to a disk that already has a queue service time in
the seconds.  I think my last post was pretty clear about the 64mb PII-333mhz
set and the work load that trashed it. The assertion that paging occurs only
when active cache + active anon is down to 15% of mem can be verfied and explored
other ways .... consider the vmstat data on the same 64MB PII-333mhz machine
under normal use with a "vmstat 30" trace running:

  procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 1  0  0  12720  12224   8636  11984    0    0     0     5  134    11  0  1 99
 0  0  0  12720  12224   8636  11984    0    0     0     0  141    11  0  0 100
 0  0  0  12720  12224   8636  11984    0    0     0     0  173    12  0  1 99
 0  0  0  12720  12224   8636  11984    1    0     1     0  150    15  0  0 99
 0  0  0  12720  12224   8636  11984    1    0     1     0  130    14  0  0 100
 0  0  0  12716  14456   6380  12144    1   83    25   156  158    35 20  3 77
 0  0  0  12716  14456   6380  12144    0    0     0    14  159    12  0  1 99
 0  0  0  12716  14456   6380  12144    3    0     3     0  189    17  0  1 99
 0  0  0  12716  14456   6408  12188    4    0     6     2  262    19  0  2 98
 0  0  0  12716  14456   6408  12188    0    0     0     1  176    13  0  2 98
 2  0  0  12716  14428   6408  12188    1    0     1     0  169    14  0  1 99
 0  0  0  12716  14052   6432  12216    0    0     0     7  185    14  0  2 98
 0  0  0  12716  14052   6432  12216    0    0     0     5  192    12  0  1 98
 0  0  0  12716  14052   6432  12248    3    0     4     0  160    28  0  1 99
 0  0  0  12716  14052   6432  12248    1    0     1     0  180    14  0  1 99
 0  0  0  12752  17400   4412  11004   16   82    32   154  192    47 21  3 77
 0  0  0  12752  17400   4412  11004    1    0     1    14  139    13  0  0 100
 0  0  0  12752  17400   4412  11004    0    0     0     0  159    13  0  1 99
 0  0  0  12752  17400   4420  11004    1    0     1     0  171    12  0  0 100
 0  0  0  12752  17368   4436  11164    3    0     8     1  162    19  0  1 99
 0  0  0  12752  17344   4436  11164    1    0     1     0  140    13  0  0 100
 0  0  0  12752  17344   4436  11164    0    0     0     0  130    12  0  1 99
 0  0  0  12752  17340   4436  11164    0    0     0     0  148    12  0  0 100
 0  0  0  12752  16752   4488  11388    0    0     8     7  129    15  0  1 99
 0  0  0  12752  16752   4504  11388    0    0     0     6  181    12  0  1 99
 0  0  0  12748  17132   4688  10852    1   63     9   125  203    34 21  3 76
 0  0  0  12748  17132   4692  10964    1    0     5    14  152    19  0  1 99
 0  0  0  12748  17132   4692  10964    0    0     0     0  170    12  0  1 99
 0  0  0  12748  17132   4692  10964    0    0     0     0  129    12  0  0 100
 0  0  0  12748  17132   4700  10964    0    0     0     1  132    14  0  0 100
 0  0  0  12748  17132   4700  10964    0    0     0     0  138    13  0  0 100
 0  0  0  12748  17132   4700  10964    0    0     0     0  133    11  0  0 100
 0  0  0  12748  17132   4700  10964    1    0     1     0  141    14  0  1 99
 0  0  0  12748  17132   4700  10964    0    0     0     0  129    11  0  0 100
 0  0  0  12748  17132   4700  10964    0    0     0     0  127    11  0  0 100
 0  0  0  12728  16648   4948  11004    0   22     7    96  155    36 21  3 76
 0  0  0  12728  16648   4948  11004    0    0     0    16  132    12  0  0 100
 0  0  0  12728  16648   4948  11004    0    0     0     0  162    12  0  1 99
 0  0  0  12728  16648   4948  11004    0    0     0     0  136    14  0  0 100
 0  0  0  12728  16648   4948  11004    0    0     0     0  138    14  0  0 99
 0  0  0  12728  16648   4948  11004    0    0     0     0  152    12  0  1 99
 0  0  0  12728  16648   4948  11004    0    0     0     0  140    12  0  1 99
 0  0  0  12728  16648   4948  11004    0    0     0     0  140    12  0  1 99
 0  0  0  12728  16648   4948  11004    0    0     0     0  135    12  0  0 100
 0  0  0  12728  16648   4948  11028    0    0     1     0  167    16  0  1 99
 0  0  0  12760  17416   5076  10324    0  157    15   234  223    36 20  3 76
 0  0  0  12760  17416   5076  10324    1    0     1    14  268    32  0  2 98
 0  0  0  12760  17248   5124  10420    0    0     3     7  328    20  0  4 96
 0  0  0  12760  17248   5124  10420    0    0     0     5  213    15  0  1 98
 0  0  0  12760  17248   5132  10420    0    0     0     1  183    18  0  1 99
 0  0  0  12760  17248   5132  10420    0    0     0     0  167    13  0  1 99
 0  0  0  12760  17248   5132  10420    0    0     0     0  151    12  0  1 99
 0  0  0  12760  17248   5132  10420    0    0     0     0  147    11  0  1 99
 0  0  0  12760  17248   5132  10420    0    0     0     0  139    14  0  0 100
 0  0  0  12760  17248   5132  10420    0    0     0     0  140    13  0  0 99
 0  0  0  12760  16948   5372  10556    0   40     8   108  148    33 21  2 77
 0  0  0  12760  16948   5372  10556    0    0     0    14  134    11  0  0 100


we can clearly see the impact the mrtg has on the system every 5 minutes (every
10 samples) where it nearly always forces page outs ... since the numbers are
normalized to per second figures, a 40block/sec average with 30 second quantum
implies 1,200 blocks or 1.2MB was flushed to swap. The 157block/sec number
implies that 4,710 blocks, or 4.7MB was flush to swap .... all the while
the filesystem cache is around 10mb and the buffer cache above 5MB which for
this machine is certainly well above the 15% figure.

One doesn't need to look very hard to see this, or provoke it. Consider the
normal state for this machie:

# free
             total       used       free     shared    buffers     cached
Mem:         61412      60680        732          0       3172      30404
-/+ buffers/cache:      27104      34308
Swap:       200772      12840     187932
# vmstat 30
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 0  0  0  12840    804   2888  30496    6   16    13     5   13    21  7 11 13
 0  0  0  12840    804   2748  30500    7    1     8     6  155    16  0  1 99
 0  0  0  12840   1616   2864  29112    4    1    18     9  143    20  0  1 98
 0  0  0  12840   1136   2824  29296   21    0    26     6  193    45  0  1 99
 0  0  0  12840   1128   2736  29384    1    0     1     0  142    13  0  1 99
 0  0  0  12840   1128   2736  29384    1    0     1     0  141    15  0  0 100
 0  0  0  12840   1128   2736  29384    0    0     0     0  131    13  0  0 100
 0  0  0  12840   1092   2736  29384    2    0     2     0  147    18  0  0 99
 0  0  0  12840   6804   2456  17872    6   49   212    85  169    67 19  2 79
 0  0  0  12840  11520   2700  19136    5    1    37    40  154    30  2  1 97
 0  0  0  12840  11520   2704  19136    3    0     3    14  167    23  0  1 99
 0  0  0  12840  11520   2704  19136    0    0     0     0  147    13  0  1 99
 0  0  0  12840  11380   2708  19196    7    0     9     0  154    22  0  1 99
 1  0  0  12648    772   3536  29064    8   38   381   293  232   118  1  3 96
 1  0  0  12648    672   2040  32632   34   55  3883  3745  435   397  2 18 80

Note the transition in free .... which is an indication of the working set
size that caused the impluse.

The typical number I see for filesystem/buffer cache on this machine is
frequently well above 25MB, and depends largely on the amount of downloads
in the recient past. And as you can see, we have already started signficant
page in and out traffic with the cache consuming over half of real memory.
This is WRONG.


Bind certainly has the largest VM allocation of all the processes, but remains
trimmed to a fairly small working set:

# ps -laxf
F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY        TIME COMMAND
1    25  1318     1  25   0 35836 1896 rt_sig S    ?          1:32
/usr/sbin/named -u named1     0 12356     1  15   0  1440   36 pipe_w S    ?   
      0:00 CROND
4   512 12361 12356  15   0  8772 1564 lock_p D    ?          0:03  \_
/usr/bin/perl /usr/bin/mrtg /home/mrtg/cwx.cfg --logging /va
4   512 12629 12356  15   0  5736    4 pipe_w S    ?          0:00  \_
/usr/sbin/sendmail -FCronDaemon -i -odi -oem mrtg



as compared with the mrtg cron task that triggers the periodic paging.

Now, as I noted the longer post the problem isn't fatal until there is
significant disk traffic by other applications, such as active TUX/rsyncd
file serving which radically impact the paging rate.

Here for example, is a snapshot of the vmstat during the meltdown the other
day that happens to still be in an active window on my desktop:

# vmstat 60
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 7 30  3 182764    528   1568   6384    7   18    11     3    7    20  7 12 38
 0 30  0 159168    636   1728   5060  385  279   406   293  281   372  0  2 98
 5 26  0 153432    532   1716   5108  368  306   401   312  283   377  0  2 98
 6 24  0 159576    520   1424   5656  389  463   449   470  323   357  0  2 98
 1 30  1 160256    556   1508   6052  425  413   487   425  369   403  1  2 97
 5 25  2 158912    548   1416   5296  458  372   503   379  302   373  0  2 98
 7 33  2 165512    520   1520   6116  400  482   472   495  341   384  1  2 97
 0 30  0 164280    520   1492   6164  437  461   503   485  324   429  3  2 95
 5 24  0 164352    652   1660   6092  343  361   389   381  300   349  3  2 95
26  4  0 165420    768   1504   5560  363  380   402   393  329   351  1  2 96
 0 29  0 160300    876   1508   4600  377  290   396   298  297   366  0  1 98
 8 20  0 166380    640   1544   4676  371  413   390   418  326   326  0  2 98
22  8  0 163624    524   1344   4652  394  324   420   330  304   365  1  1 98
 0 29  0 168948    584   1364   4640  396  415   419   418  309   330  0  2 98
 3 27  0 160072    552   1356   5268  368  317   405   323  292   351  1  2 98
23  6  1 153780    576   1588   6124  357  335   404   353  303   349  1  1 98


The 15% figure for cache here of total real memory ... and after subtracting
the memory consumed by the kernel for other reasons and core processes which
activate frequently ... bind, cron, .... etc, the 15% figure is a much higher
REAL percentage of usable process memory. Here the machine has stacked up
a little over a half dozen mrtg/perl/sendmail tasks, plus has multiple active
tux/rsync clients driving a base I/O load which actively impacts the paging
rate and the filesystem caching is actively contributing to the paging rate.

As said in the first post, I frequently see archival operations and rpm updates
drive the cache percentage high and trigger substantial paging .... critical
meltdowns in the past have been invoked not by mrtg on this machine, but
by network rpm updates, in particular rpm --rebuilddb. As this machine is a
mirror server, it frequently sees large sustained file accesses during mirror
update and RedHat net installs.

The server has a twin, with 512MB of DRAM, which while harder to provoke into
vm trashing, does do so at times, but manages to typically recover on it's own.
That machine, also a mirror server, frequently has over 300MB in filesystem
cache, and starts paging just as easily.
Comment 10 John Bass 2003-06-02 19:02:54 EDT
Ok - staging a work load to demonstrate active pagine to disk with high cache
values .... I simply tar'ed /var/ftp/pub/mirrors to /dev/null to create filesystem
I/O:



Comment 11 John Bass 2003-06-02 19:10:03 EDT
# tar cf /dev/null /var/ftp/pub/mirrors
# vmstat 30&
# while cat /proc/meminfo
> do
> sleep 10
done

wait a few minutes for cron to start mrtg and we get

       total:    used:    free:  shared: buffers:  cached:
Mem:  62885888 62119936   765952        0 12951552 24379392
Swap: 205590528 13516800 192073728
MemTotal:        61412 kB
MemFree:           748 kB
MemShared:           0 kB
Buffers:         12648 kB
Cached:          16592 kB
SwapCached:       7216 kB
Active:          30248 kB
ActiveAnon:       9712 kB
ActiveCache:     20536 kB
Inact_dirty:      2740 kB
Inact_laundry:    3348 kB
Inact_clean:       692 kB
Inact_target:     7404 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        61412 kB
LowFree:           748 kB
SwapTotal:      200772 kB
SwapFree:       187572 kB
 1  0  0  13200    764  12920  16276    2    7   384   141  526   356  7 15 77
        total:    used:    free:  shared: buffers:  cached:
Mem:  62885888 62095360   790528        0 13824000 22847488
Swap: 205590528 13516800 192073728
MemTotal:        61412 kB
MemFree:           772 kB
MemShared:           0 kB
Buffers:         13500 kB
Cached:          15124 kB
SwapCached:       7188 kB
Active:          30488 kB
ActiveAnon:      10036 kB
ActiveCache:     20452 kB
Inact_dirty:      2836 kB
Inact_laundry:    2352 kB
Inact_clean:      1052 kB
Inact_target:     7344 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        61412 kB
LowFree:           772 kB
SwapTotal:      200772 kB
SwapFree:       187572 kB
        total:    used:    free:  shared: buffers:  cached:
Mem:  62885888 56987648  5898240        0  8478720 14479360
Swap: 205590528 13975552 191614976
MemTotal:        61412 kB
MemFree:          5760 kB
MemShared:           0 kB
Buffers:          8280 kB
Cached:           7340 kB
SwapCached:       6800 kB
Active:          26084 kB
ActiveAnon:      16936 kB
ActiveCache:      9148 kB
Inact_dirty:      3560 kB
Inact_laundry:    3356 kB
Inact_clean:       620 kB
Inact_target:     6724 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        61412 kB
LowFree:          5760 kB
SwapTotal:      200772 kB
SwapFree:       187124 kB
 0  4  2  13452   4608   9080  11516   14   97   388   191  477   191 24 12 64
        total:    used:    free:  shared: buffers:  cached:
Mem:  62885888 52834304 10051584        0  9420800 19005440
Swap: 205590528 13639680 191950848
MemTotal:        61412 kB
MemFree:          9816 kB
MemShared:           0 kB
Buffers:          9200 kB
Cached:          11644 kB
SwapCached:       6916 kB
Active:          22704 kB
ActiveAnon:       8976 kB
ActiveCache:     13728 kB
Inact_dirty:      3688 kB
Inact_laundry:    2788 kB
Inact_clean:       496 kB
Inact_target:     5932 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        61412 kB
LowFree:          9816 kB
SwapTotal:      200772 kB
SwapFree:       187452 kB
        total:    used:    free:  shared: buffers:  cached:
Mem:  62885888 56217600  6668288        0 10932224 20115456
Swap: 205590528 13512704 192077824
MemTotal:        61412 kB
MemFree:          6512 kB
MemShared:           0 kB
Buffers:         10676 kB
Cached:          12520 kB
SwapCached:       7124 kB
Active:          24840 kB
ActiveAnon:       8756 kB
ActiveCache:     16084 kB
Inact_dirty:      2920 kB
Inact_laundry:    3300 kB
Inact_clean:       492 kB
Inact_target:     6308 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        61412 kB
LowFree:          6512 kB
SwapTotal:      200772 kB
SwapFree:       187576 kB
        total:    used:    free:  shared: buffers:  cached:
Mem:  62885888 61509632  1376256        0 13250560 20168704
Swap: 205590528 13496320 192094208
MemTotal:        61412 kB
MemFree:          1344 kB
MemShared:           0 kB
Buffers:         12940 kB
Cached:          12584 kB
SwapCached:       7112 kB
Active:          26312 kB
ActiveAnon:       8532 kB
ActiveCache:     17780 kB
Inact_dirty:      2164 kB
Inact_laundry:    4260 kB
Inact_clean:       492 kB
Inact_target:     6644 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        61412 kB
LowFree:          1344 kB
SwapTotal:      200772 kB
SwapFree:       187592 kB
 0  0  0  13180    848  13932  13024   16   26   279   106  500   349  7 15 78
        total:    used:    free:  shared: buffers:  cached:
Mem:  62885888 61857792  1028096        0 14266368 20537344
Swap: 205590528 13496320 192094208
MemTotal:        61412 kB
MemFree:          1004 kB
MemShared:           0 kB
Buffers:         13932 kB
Cached:          13024 kB
SwapCached:       7032 kB
Active:          28016 kB
ActiveAnon:       8836 kB
ActiveCache:     19180 kB
Inact_dirty:      2192 kB
Inact_laundry:    3536 kB
Inact_clean:       836 kB
Inact_target:     6916 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        61412 kB
LowFree:          1004 kB
SwapTotal:      200772 kB
SwapFree:       187592 kB


hope that helps
Comment 12 Erik Reuter 2003-06-05 13:03:01 EDT
Here's output from my /proc/meminfo.

There's some free mem in the dump (180MB) due to me just closing an application,
but this is after working for at couple of hours where the Cache never went
beneath app. 300MB - there has been plenty swapping activity.


        total:    used:    free:  shared: buffers:  cached:
Mem:  1320443904 1136025600 184418304        0 34238464 498208768
Swap: 2089177088 220901376 1868275712
MemTotal:      1289496 kB
MemFree:        180096 kB
MemShared:           0 kB
Buffers:         33436 kB
Cached:         317408 kB
SwapCached:     169124 kB
Active:         941812 kB
ActiveAnon:     655812 kB
ActiveCache:    286000 kB
Inact_dirty:      2316 kB
Inact_laundry:   78156 kB
Inact_clean:     13832 kB
Inact_target:   207220 kB
HighTotal:      393200 kB
HighFree:        44488 kB
LowTotal:       896296 kB
LowFree:        135608 kB
SwapTotal:     2040212 kB
SwapFree:      1824488 kB
Comment 13 John Bass 2003-06-05 14:51:52 EDT
Here is the 64MB router/mirror server doing an rsync mirror update
with 36MB tied up in buffers and cache and the system is paging heavily
with 10-20 second command response times.

# vmstat 30
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 1 19  3  99428    644   5136  10856    5   14    12     9   12    22  6  9  3
 3 22  0 100320    644   5096   9492  229  102   302  1020 1396  1093  4 15 81
 5 20  2 101736    532   4272  11048  240  238   354   917 1066   829  4 11 85

]# cat /proc/mem*
        total:    used:    free:  shared: buffers:  cached:
Mem:  62885888 62193664   692224        0  4874240 29519872
Swap: 205590528 102973440 102617088
MemTotal:        61412 kB
MemFree:           676 kB
MemShared:           0 kB
Buffers:          4760 kB
Cached:          11208 kB
SwapCached:      17620 kB
Active:          31456 kB
ActiveAnon:      23832 kB
ActiveCache:      7624 kB
Inact_dirty:      3632 kB
Inact_laundry:    3008 kB
Inact_clean:       724 kB
Inact_target:     7764 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        61412 kB
LowFree:           676 kB
SwapTotal:      200772 kB
SwapFree:       100212 kB
Comment 14 John Bass 2003-06-05 15:08:00 EDT
Couple more notes on the previous post .... the rsync mirror update triggered
another mrtg stackup from agressive paging due to excessive cache/buffer use.

It will be interesting to see if this one recovers, or dies from congestive
paging failure. In any case, having the vast majority of memory tied up in
buffers and cache while paging to death is just plain WRONG.

 vmstat 30
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 0 12  0  89712    668   3204  10332    5   14    12     9   13    22  6  9  3
 4 10  0  90120    644   3032  10708  332  124   394   426  810   648  1  6 92
 2 13  1  89992    692   3000  11140  298  107   418   539  863   777  6  7 87
 4  9  0  84552    664   3056  10636  349  109   391   493  818   704  3  7 90

[root@cwx mirrors]# cat /proc/mem*
        total:    used:    free:  shared: buffers:  cached:
Mem:  62885888 62238720   647168        0  3371008 32702464
Swap: 205590528 80367616 125222912
MemTotal:        61412 kB
MemFree:           632 kB
MemShared:           0 kB
Buffers:          3292 kB
Cached:           9188 kB
SwapCached:      22748 kB
Active:          33456 kB
ActiveAnon:      25788 kB
ActiveCache:      7668 kB
Inact_dirty:      3000 kB
Inact_laundry:    2264 kB
Inact_clean:       564 kB
Inact_target:     7856 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        61412 kB
LowFree:           632 kB
SwapTotal:      200772 kB
SwapFree:       122288 kB
[root@cwx mirrors]# ps -laxf
F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY        TIME COMMAND
4     0     1     0  15   0  1372  452 schedu S    ?          0:08 init
1     0     2     1  15   0     0    0 contex SW   ?          6:09 [keventd]
1     0     3     1  15   0     0    0 schedu SW   ?          0:00 [kapmd]
1     0     4     1  34  19     0    0 ksofti SWN  ?          0:02 [ksoftirqd_CPU0]
1     0     9     1  15   0     0    0 bdflus SW   ?          0:09 [bdflush]
1     0     5     1  15   0     0    0 schedu SW   ?          3:10 [kswapd]
1     0     6     1  15   0     0    0 schedu SW   ?          0:00 [kscand/DMA]
1     0     7     1  15   0     0    0 schedu SW   ?          0:00 [kscand/Normal]
1     0     8     1  15   0     0    0 schedu SW   ?          0:00 [kscand/HighMem]
1     0    10     1  15   0     0    0 schedu SW   ?          0:00 [kupdated]
1     0    11     1  25   0     0    0 md_thr SW   ?          0:00 [mdrecoveryd]
1     0    15     1  15   0     0    0 end    SW   ?          1:55 [kjournald]
1     0    73     1  25   0     0    0 end    SW   ?          0:00 [khubd]
1     0   647     1  15   0     0    0 end    SW   ?          0:00 [kjournald]
1     0   648     1  15   0     0    0 end    SW   ?          0:01 [kjournald]
1     0  2361     1  15   0  1452  284 schedu S    ?          1:30 syslogd -m 0
5     0  2365     1  15   0  1380  140 do_sys S    ?          0:20 klogd -x
5    32  2383     1  17   0  1644  232 schedu S    ?          0:00 portmap
5    29  2402     1  25   0  1616  320 schedu S    ?          0:00 rpc.statd
5     0  2439     1  24   0  1368  176 schedu S    ?          0:00
/usr/sbin/apmd -p 10 -w 5 -W -P /etc/sysconfig/apm-scripts/apmsc
5     0  2525     1  16   0  3516  168 schedu S    ?          0:13 /usr/sbin/sshd
5     0 13317  2525  16   0  6760    0 schedu SW   ?          0:00  \_
/usr/sbin/sshd
5   510 13319 13317  15   0  6800    0 schedu SW   ?          0:00  |   \_
/usr/sbin/sshd
0   510 13320 13319  18   0  4316    0 schedu SW   pts/0      0:00  |       \_ -bash
1     0 30392  2525  15   0  6896    4 schedu S    ?          0:10  \_
/usr/sbin/sshd
4     0 30394 30392  15   0  4400    0 wait4  SW   pts/1      0:02  |   \_ -bash
0     0 19323 30394  21   0  4100    0 wait4  SW   pts/1      0:00  |       \_ su -
4     0 19324 19323  15   0  4360    0 wait4  SW   pts/1      0:00  |          
\_ -bash
0     0 27601 19324  15   0  4124    0 wait4  SW   pts/1      0:00  |          
    \_ sh fast
0     0 27602 27601  23   0  4172    0 wait4  SW   pts/1      0:00  |          
        \_ sh xx g
4     0 27606 27602  15   0  4572    0 schedu SW   pts/1      0:05  |          
            \_ rsync -v -a --delete --stats --bwlim
5     0 27689 27606  15   0  4572  864 lock_p D    pts/1      0:52  |          
                \_ rsync -v -a --delete --stats --b
1     0 17887  2525  15   0  6900  296 schedu S    ?          1:08  \_
/usr/sbin/sshd
4     0 17906 17887  15   0  4364  604 wait4  S    pts/2      0:04  |   \_ -bash
4     0 28108 17906  15   0  3224 1276 -      R    pts/2      0:00  |       \_
ps -laxf
5     0 20129  2525  20   0  6764    0 schedu SW   ?          0:00  \_
/usr/sbin/sshd
5   513 20131 20129  15   0  6840    0 schedu SW   ?          0:00      \_
/usr/sbin/sshd
0   513 20132 20131  16   0  4332    0 schedu SW   pts/3      0:00          \_ -bash
5     0  2539     1  15   0  2064  228 schedu S    ?          0:00 xinetd
-stayalive -pidfile /var/run/xinetd.pid
5    38  2555     1  15   0  2400 2396 schedu SL   ?          0:22 ntpd -U ntp
5     0  2578     1  15   0  5956  516 schedu S    ?          0:34 sendmail:
rejecting connections on daemon MTA: load average: 17
1    51  2587     1  15   0  5744  108 pause  S    ?          0:01 sendmail:
Queue runner@01:00:00 for /var/spool/clientmqueue
1     0  2597     1  15   0  1420   12 schedu S    ?          0:00 gpm -t ps/2
-m /dev/mouse
5     0  2928     1  15   0  1640  240 pause  S    ?          0:00 [TUX date]
1     0  2929     1  15   0     0    0 schedu SW   ?          0:04 [TUX logger]
1     0  2930     1  25   0  1648  104 wait4  S    ?          0:00 [TUX manager]
5    99  2931  2930  15   0  1648  160 end    S    ?          4:24  \_ [TUX
worker 0]
1    99  2932  2931  15   0  1648  164 end    S    ?          1:05      \_
[async IO 0/1]
1    99  2933  2931  15   0  1648  164 end    S    ?          0:20      \_
[async IO 0/2]
1    99  2934  2931  15   0  1648  164 end    S    ?          0:07      \_
[async IO 0/3]
1    99  2935  2931  15   0  1648  164 end    S    ?          0:01      \_
[async IO 0/4]
1    99  2936  2931  15   0  1648  164 end    S    ?          0:00      \_
[async IO 0/5]
1    99  2937  2931  15   0  1648  164 end    S    ?          0:00      \_
[async IO 0/6]
1    99  2938  2931  15   0  1648  164 end    S    ?          0:00      \_
[async IO 0/7]
1    99  2939  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/8]
1    99  2940  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/9]
1    99  2941  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/10]
1    99  2942  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/11]
1    99  2943  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/12]
1    99  2944  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/13]
1    99  2945  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/14]
1    99  2946  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/15]
1    99  2947  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/16]
1    99  2948  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/17]
1    99  2949  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/18]
1    99  2950  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/19]
1    99  2951  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/20]
1    99  2952  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/21]
1    99  2953  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/22]
1    99  2954  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/23]
1    99  2955  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/24]
1    99  2956  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/25]
1    99  2957  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/26]
1    99  2958  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/27]
1    99  2959  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/28]
1    99  2960  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/29]
1    99  2961  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/30]
1    99  2962  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/31]
1    99  2963  2931  25   0  1648  164 end    S    ?          0:00      \_
[async IO 0/32]
4     0  2968     1  21   0  1360  124 schedu S    tty3       0:00
/sbin/mingetty tty3
4     0  2969     1  21   0  1360    0 schedu SW   tty4       0:00
/sbin/mingetty tty4
4     0  2970     1  21   0  1360    0 schedu SW   tty5       0:00
/sbin/mingetty tty5
4     0  2971     1  21   0  1360    0 schedu SW   tty6       0:00
/sbin/mingetty tty6
1     0  2973     1  15   0     0    0 end    SW   ?          0:11 [kjournald]
1     0 27219     1  15   0     0    0 end    SW   ?          0:00 [kjournald]
1     0 27262     1  25   0  3592    0 schedu SW   ?          0:00 rpc.rquotad
5     0 27266     1  15   0     0    0 schedu SW   ?          0:00 [nfsd]
1     0 27267     1  15   0     0    0 schedu SW   ?          0:00 [nfsd]
1     0 27268     1  15   0     0    0 schedu SW   ?          0:00 [nfsd]
1     0 27274     1  25   0     0    0 schedu SW   ?          0:00 [lockd]
1     0 27275     1  25   0     0    0 end    SW   ?          0:00 [rpciod]
1     0 27269     1  15   0     0    0 schedu SW   ?          0:00 [nfsd]
1     0 27270     1  15   0     0    0 schedu SW   ?          0:00 [nfsd]
1     0 27271     1  15   0     0    0 schedu SW   ?          0:00 [nfsd]
1     0 27272     1  15   0     0    0 schedu SW   ?          0:00 [nfsd]
1     0 27273     1  15   0     0    0 schedu SW   ?          0:00 [nfsd]
1     0 27281     1  25   0  1644    0 schedu SW   ?          0:00 rpc.mountd
1    25  1318     1  25   0 37400 1332 rt_sig S    ?         13:54
/usr/sbin/named -u named
1     0 12674     1  15   0  1428   40 schedu S    ?          0:01 crond
1     0 27886 12674  15   0  1440    0 pipe_w SW   ?          0:00  \_ CROND
4   512 27891 27886  15   0  8792 1088 -      R    ?          0:02  |   \_
/usr/bin/perl /usr/bin/mrtg /home/mrtg/cwx.cfg --logging
4   512 27966 27886  15   0  5732    0 pipe_w SW   ?          0:00  |   \_
/usr/sbin/sendmail -FCronDaemon -i -odi -oem mrtg
1     0 27914 12674  15   0  1440    0 pipe_w SW   ?          0:00  \_ CROND
4   512 27917 27914  15   0  8792 1052 lock_p D    ?          0:02  |   \_
/usr/bin/perl /usr/bin/mrtg /home/mrtg/cwx.cfg --logging
4   512 27987 27914  15   0  5736    0 pipe_w SW   ?          0:00  |   \_
/usr/sbin/sendmail -FCronDaemon -i -odi -oem mrtg
1     0 27942 12674  15   0  1440    0 pipe_w SW   ?          0:00  \_ CROND
4   512 27946 27942  15   0  8792 1100 lock_p D    ?          0:02  |   \_
/usr/bin/perl /usr/bin/mrtg /home/mrtg/cwx.cfg --logging
4   512 28010 27942  16   0  5728    0 pipe_w SW   ?          0:00  |   \_
/usr/sbin/sendmail -FCronDaemon -i -odi -oem mrtg
1     0 28018 12674  19   0  1436    0 pipe_w SW   ?          0:00  \_ CROND
4     0 28022 28018  15   0  8524  920 lock_p D    ?          0:02  |   \_
/usr/bin/perl /usr/bin/mrtg /etc/mrtg/mrtg.cfg
1     0 28020 12674  15   0  1440    0 pipe_w SW   ?          0:00  \_ CROND
4   512 28024 28020  15   0  8576  916 lock_p D    ?          0:02  |   \_
/usr/bin/perl /usr/bin/mrtg /home/mrtg/users.cfg --loggi
4   512 28056 28020  15   0  5732    0 pipe_w SW   ?          0:00  |   \_
/usr/sbin/sendmail -FCronDaemon -i -odi -oem mrtg
1     0 28021 12674  15   0  1440    0 pipe_w SW   ?          0:00  \_ CROND
4   512 28025 28021  15   0  8796 2100 pipe_w S    ?          0:02  |   \_
/usr/bin/perl /usr/bin/mrtg /home/mrtg/cwx.cfg --logging
4   512 28054 28021  15   0  5732    0 pipe_w SW   ?          0:00  |   \_
/usr/sbin/sendmail -FCronDaemon -i -odi -oem mrtg
1     0 28048 12674  18   0  1436    0 pipe_w SW   ?          0:00  \_ CROND
4     0 28051 28048  15   0  8532 1100 lock_p D    ?          0:01  |   \_
/usr/bin/perl /usr/bin/mrtg /etc/mrtg/mrtg.cfg
1     0 28049 12674  15   0  1440    0 pipe_w SW   ?          0:00  \_ CROND
4   512 28052 28049  15   0  8580 1136 lock_p D    ?          0:02  |   \_
/usr/bin/perl /usr/bin/mrtg /home/mrtg/users.cfg --loggi
4   512 28061 28049  15   0  5736    0 pipe_w SW   ?          0:00  |   \_
/usr/sbin/sendmail -FCronDaemon -i -odi -oem mrtg
1     0 28050 12674  15   0  1440    0 pipe_w SW   ?          0:00  \_ CROND
4   512 28053 28050  15   0  8748 2044 lock_p D    ?          0:02  |   \_
/usr/bin/perl /usr/bin/mrtg /home/mrtg/cwx.cfg --logging
4   512 28068 28050  19   0  5736    0 pipe_w SW   ?          0:00  |   \_
/usr/sbin/sendmail -FCronDaemon -i -odi -oem mrtg
1     0 28080 12674  15   0  1436  360 wait4  S    ?          0:00  \_ CROND
4   512 28106 28080  15   0  5740 2364 end    D    ?          0:00  |   \_
/usr/sbin/sendmail -FCronDaemon -i -odi -oem mrtg
1     0 28081 12674  15   0  1440  336 pipe_w S    ?          0:00  \_ CROND
4   512 28086 28081  15   0  8772 6880 schedu S    ?          0:01      \_
/usr/bin/perl /usr/bin/mrtg /home/mrtg/cwx.cfg --logging
4   512 28107 28081  15   0  5732 2320 pipe_w S    ?          0:00      \_
/usr/sbin/sendmail -FCronDaemon -i -odi -oem mrtg
4     0 18814     1  20   0  1356    0 schedu SW   tty1       0:00
/sbin/mingetty tty1
4     0 18816     1  21   0  1356    0 schedu SW   tty2       0:00
/sbin/mingetty tty2
Comment 15 Christopher McCrory 2003-06-11 17:32:15 EDT
>> The problem when all memory is used, is that Linux seems to prefer swapping
>> over releasing memory from the cache. The system (X especially) appears
>> sluggish when this happens.


I started seeing this also after kernel-smp-2.4.18-27.7.x ->
kernel-smp-2.4.20-18.7 update  (RH73)

After some probing I found this difference:

chrismcc]$  uname -a ; cat /proc/sys/vm/bdflush
Linux eeyore 2.4.18-27.7.xsmp #1 SMP Fri Mar 14 05:52:30 EST 2003 i686 unknown
30      500     0       0       2560    15360   60      20      0

chrismcc]$  uname -a ; cat /proc/sys/vm/bdflush
Linux piglet 2.4.20-18.7smp #1 SMP Thu May 29 07:49:23 EDT 2003 i686 unknown
30      500     0       0       500     3000    60      20      0


As a test I did:
/sbin/sysctl -w vm.bdflush="30 500 0 0 2560 15360 60 20 0"

[chrismcc@kanga chrismcc]$  uname -a ; cat /proc/sys/vm/bdflush
Linux kanga 2.4.20-18.7smp #1 SMP Thu May 29 07:49:23 EDT 2003 i686 unknown
30      500     0       0       2560    15360   60      20      0


And... tada, all was well again

Could this be the cause of the above problems?  Or different?






Comment 16 Bruce A. Locke 2003-07-11 13:25:35 EDT
This behaviour is nearly making Red Hat 9 unusable as a platform for a squid proxy.

I am seeing Squid's constant disk read and writes causing buffer and cache sizes
to grow slowly over time and cause large parts of squid to be swapped out.

Squid is installed on a server with 1gb of ram.  The max VSZ size I have seen so
far is around 550mb of RAM, just around half of the systems ram.  With half the
ram of the system supposidly "available" I don't consider it unreasonable to
expect that the active application not have half its pages swapped out and not
having to access swap constantly to get at those pages back.

The following is output from "vmstat 1 2000":

   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 1  0  0  79768   9348 341788 476640    0    1     6    23    7    13  0  1 24
 0  0  0  79776   9320 341560 476880    8    8     8     8  628   244  0  3 97
 0  0  0  79776   9208 341560 477008    0    0    16     0  575   245  2  3 95
 0  0  0  79784   9072 341352 477344    8    8     8     8  575   170  1  2 97
 0  0  0  79784   8988 341352 477416   12    0    12     0  438   158  1  1 98
 0  0  0  79904   8988 341344 477432    0  132     0   380  410   169  0  2 98
 0  0  0  79904   8972 341344 477512    8    0    80     0  405   189  0  1 99
 0  0  0  79904   8884 341360 477588    8    0     8     0  850   299  0  6 94
 0  0  0  79904   8776 341488 477664   40    0    52     0  972   542  2 10 88
 0  0  0  79936   8636 341600 477736   56   72    56    80 1297   574  3  9 88
 0  0  1  79936   8672 341820 477564   24    0    24   912  949   480  3  6 91
 0  0  0  79944   8668 341836 477580    0    8    12   944 1034   469  1  3 96
 0  0  0  79944   8672 341836 477580    0    0     0     0  316    94  1  3 96
 0  0  0  79960   8676 341672 477748    0  128     0   204  343   109  0  1 99
 0  0  0  79960   8692 341680 477736    4    0    28     0  741   333  1  3 95
 0  0  0  79960   8708 341532 477880    4    0    12    28  630   278  1  4 95
 0  0  0  79960   8708 341624 477784    4    0     4   444  291   152  0  1 99
 0  0  0  79960   8708 341624 477784    0    0     0     0  357   155  0  2 98
 0  0  0  79960   8708 341628 477780    0    0     8     0  542   242  1  1 98
 0  0  0  79960   8708 341628 477768   12    0    16    16 1254   438  2  8 89
 
The following is /proc/meminfo:

        total:    used:    free:  shared: buffers:  cached:
Mem:  1054982144 1045995520  8986624        0 349908992 562728960
Swap: 534601728 82034688 452567040
MemTotal:      1030256 kB
MemFree:          8776 kB
MemShared:           0 kB
Buffers:        341708 kB
Cached:         477592 kB
SwapCached:      71948 kB
Active:         757492 kB
ActiveAnon:     108844 kB
ActiveCache:    648648 kB
Inact_dirty:        84 kB
Inact_laundry:  151652 kB
Inact_clean:     21916 kB
Inact_target:   186228 kB
HighTotal:      131008 kB
HighFree:         1076 kB
LowTotal:       899248 kB
LowFree:          7700 kB
SwapTotal:      522072 kB
SwapFree:       441960 kB

The output of "ps -eo pid,user,args,vsz,rss | grep squid" is:
31695 root     /usr/local/squid  5588  552
31697 squid    (squid)          522940 108052
31699 squid    (unlinkd)         1344    8

I have tried many different values for /proc/sys/vm/bdflush and
/proc/sys/vm/kswapd.  I can only seem to slow it down by making bdflush run much
more often but the "leak" is still there.

Comment 17 Bruce A. Locke 2003-07-11 13:50:35 EDT
I forgot to mention the kernel version in use:
2.4.20-18.9smp
Comment 18 Rik van Riel 2003-07-11 14:02:47 EDT
I will be backporting the latest -rmap updates to this kernel
Comment 19 Erik Reuter 2003-07-12 17:20:23 EDT
I have performed some tests with :

vm.bdflush = 30 500 0 0 2560 15360 60 20 0

as proposed. The system appears to swap less aggresively than the default
settings of:

vm.bdflush = 30 500 0 0 500 3000 60

But, seen from the below tests (with the proposed bdflush setting) where I've
been launching af bunch of applications to deplete my 1.2GB of memory, a
situation occurs where both Swap and Cache are rising and the system becomes
non-responsive / sluggish.

The test snapshots were performed using "free -s1", I've cut'n pasted to make
things easer to read:

test mem snapshot 1:

Free	Cached	swap used
16968	539068	58008
16064	539444	58052
16064	539632	58088
16064	539712	58088
16296	540084	58132

Test mem snapshot 2 (a couple of minutes after snapshot 1):

Free	Cached	swap used
17712	547380	58544
11548	549976	58588
12916	551820	58632
12448	552660	58668
10848	554228	58712
10096	554516	58772
11352	549580	58832
11352	549848	58892

meminfo (a little after snapshot 2):

        total:    used:    free:  shared: buffers:  cached:
Mem:  1320443904 1303855104 16588800        0 104771584 628047872
Swap: 2089177088 61177856 2027999232
MemTotal:      1289496 kB
MemFree:         16200 kB
MemShared:           0 kB
Buffers:        102316 kB
Cached:         553584 kB
SwapCached:      59744 kB
Active:         960188 kB
ActiveAnon:     479156 kB
ActiveCache:    481032 kB
Inact_dirty:        48 kB
Inact_laundry:  184728 kB
Inact_clean:     25988 kB
Inact_target:   234188 kB
HighTotal:      393200 kB
HighFree:         1024 kB
LowTotal:       896296 kB
LowFree:         15176 kB
SwapTotal:     2040212 kB
SwapFree:      1980468 kB

So, a little better with the proposed settings, but the race condition i still
evident (and noteworthy). Btw, I'm using RH9 kernel 2.4.20-18.9
Comment 20 Christopher McCrory 2003-07-21 13:43:27 EDT
Did the latest errata kernel (kernel-2.4.20-19.7) address any issues from this bug?

Comment 21 acount closed by user 2003-07-21 22:04:54 EDT
in theory , yes. 

--cut--
* Sat Jul 12 2003 Rik van Riel <riel@redhat.com>

- upgrade to latest -rmap to fix #89226, #90668, etc.
--end--

we will seeh
Comment 22 Bruce A. Locke 2003-07-22 21:23:56 EDT
2.4.20-19 does appear to have some improvement.  I've noticed on my workstation
machines that less swap is now in use.

For my squid server initially it looked ok.  As cache and buffer were rising and
squid grew free would dip down to around 8000K then some cache would be freed
and free would be back up to 12000K.  This lasted for about 20 minutes and then
it started eating into swap again.

While the rate of increasing swap usage seems noticably slower the kernel still
seems very eager to swap.  And now it appears to be a little more "bursty" about
it.  As in it tends to swap out in 1MB chunks early on and then do continous
swap ins until the next chunk it writes out (sorry I forgot to grab a vmstat
capture of this happening but will grab one if you think its needed).


[blocke@komodo blocke]$ uname -a
Linux komodo.newpaltz.edu 2.4.20-19.9smp #1 SMP Tue Jul 15 17:04:18 EDT 2003
i686 i686 i386 GNU/Linux

[blocke@komodo blocke]$ free
             total       used       free     shared    buffers     cached
Mem:       1030248    1021516       8732          0     274520     472864
-/+ buffers/cache:     274132     756116
Swap:       522072      90516     431556

[blocke@komodo blocke]$ vmstat
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 1  0  0  90516   8600 274544 472972    1    4    16   108  213   152  0  1 98

[blocke@komodo blocke]$ ps -eo pid,user,args,vsz | grep squid
 1700 root     /usr/local/squid  5592
 1702 squid    (squid)          614780
 1704 squid    (unlinkd)         1344
 2818 blocke   grep squid        3576

This may very well be me misunderstanding the amount of memory that squid needs
but I'm still believing the problem is the kernel doesn't like giving up cache
and buffer when it probably should.  (User error or Kernel problem?)

So in summary: I've seen some improvement on desktop/workstation workloads and a
very minor improvement on the squid case.


Comment 23 Gary Windham 2003-07-29 13:51:08 EDT
I am running a production Cyrus IMAP server on an SMP Redhat 2.4.20-19.7.x
kernel, and have been experiencing these same problems.  The system has 6GB RAM
and everything is fine until all free memory becomes used for filesystem cache.
 Once this happens, the  VM appears to prefer swapping to reclaiming filesystem
cache pages.  Kswapd starts consuming a large amount of CPU time, and the load
average jumps dramatically (1min. loadavg of 30-40 on a system that usually has
a loadavg of 3-4 during its busiest times).  Naturally, everything becomes
extremely sluggish.  Here is a sample output from "vmstat 5" when this situation
is occurring:

 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  3  2  73012  11712 222168 4948160  0   1    81    10   53    96   7  27  66
 1  2  0  73028  10508 223760 4948220  0   3   897  1880 1963  2471  19  63  18
 3 15  5  73072  10552 225284 4946696  0  10   362  2612 2211  2243  22  65  13
17  0  1  73160  12672 225900 4943812  1  22   590  2908 2822  3013  23  49  28
26  0  3  73912  13644 222472 4946896  0 179   626  9333 7817  3856  33  65   2
10  3  2  74008  12036 223616 4947184  6  22   735  1862 3113  3403  36  64   0
18  0  3  74272  11024 215976 4955500  3  82   790  2757 2180  2591  41  58   0
22  8  3  74240  11168 216184 4955956  0   8   473  1843 2300  2576  26  73   1 
 6  4  2  74252  10808 214828 4955836  5  24   808  2976 2875  3261  41  59   0
12  3  2  74324  12608 207488 4961412  1  55   366  2170 2095  2488  22  77   1
22  8  6  74324  10832 208636 4961360  1   2   422  1191 2001  2169  27  72   0
11 12  4  74828  10692 208568 4961264  0 116   579  6286 5889  3216  36  62   2
14  4  3  74956  11192 208416 4960296  6  27   532  1864 3033  2125  24  76   0
16 10  3  75268  10712 210336 4958820  0  62   350  4056 4143  2207  25  74   1
21  7  4  75556  10616 212176 4956108  0  71  1020  5248 5562  3337  24  75   1
21 19  5  75652  10960 215536 4954084  6  26   623  5127 3362  3212  33  67   0
20  8  6  75712  12744 215940 4951696 10  33   695  4195 2677  2840  26  74   1
21  7  2  75720  11300 217988 4950940  3  10   689  4242 3293  3740  38  62   0
13 13  3  75760  10932 221180 4946608  0   8   647  3536 2593  2893  34  66   0
19 13  3  75768  10876 223072 4944824  6   5   287  2181 2158  2360  21  79   1
14  8  5  75768  11156 225752 4941700  6   7   968  4850 3730  4495  31  68   0   

I have reverted to running the 2.4.18-18.7.x SMP kernel and am not experiencing
these problems.  Here's some output from "vmstat 5" when the system was even
busier than above (this output will only show 4GB RAM, as I compiled this kernel
before I had the full 6GB, and therefore didn't enable PAE):

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0      0  10740 383308 2628568  0   0   578  2728 3315  4361  31   6  63
 4  0  0      0  10804 383572 2628176  0   0   155  2076 2947  3718  15   5  80 
 4  1  3      0  10696 384068 2627920  0   0   420  2189 3978  4993  36   5  58 
 1  1  0      0  19460 384556 2621132  0   0   587  2898 2837  3500  26   7  67 
 0  0  0      0  16940 384972 2624160  0   0   463  2055 3411  4259  24   7  68 
 2  0  0      0  15560 385440 2626144  0   0   387  2703 3186  3885  22   9  69 
 1  0  0      0  11380 385152 2627020  0   0   521  4151 3859  5448  26   8  66 
 1  2  0      0  10988 385604 2625608  0   0   568  2163 2984  3888  34   4  62
 1  1  0      0  10648 386224 2625080  0   0  1379  2239 3085  4090  18   6  76 
 0  0  0      0  10692 386480 2626148  0   0   369  1105 2667  3458  26   7  67 
 3  2  0      0  10880 387136 2622072  0   0   316  3516 3594  4498  26   7  67 
 2  2  1      0  10680 387684 2621396  0   0   317  7511 3633  4692  31   7  62 
 1  0  0      0  10744 385524 2623248  0   0  1843  2921 4476  5306  33   7  60 
 0  0  0      0  10712 386028 2621188  0   0  2853  2749 4020  4248  25   7  68 
 0  0  0      0  11156 386684 2619760  0   0   539  3435 3287  4593  35   6  59 
 0  0  0      0  13300 387380 2616384  0   0   519  3978 4265  4917  30   7  63 
 5  0  0      0  12176 383456 2617320  0   0   198  2043 2536  3310  30   5  65 
 1  2  0      0  11328 383976 2618960  0   0  1198  2037 3024  4148  22   5  73 
 0  0  0      0  10716 384456 2614516  0   0   484  2512 2836  4154  27   5  68 
 1  0  1      0  10704 384808 2614580  0   0   237  2068 3106  4081  41   5  54 
 2  0  1      0  10636 385180 2613144  0   0   335  2064 2806  3819  24   5  70

It's difficult for me to reboot between these kernels, as this is a production
system; but if there's any other data I could capture, that would assist in
analysis of this problem, I'll try to do so.
Comment 24 Christopher McCrory 2003-07-30 13:04:55 EDT
For me:
2.4.20-19.7smp  helped a little, but it still started swapping , just not as hard.

that was on a MySQL DB server and several web servers, and a devel server, all RH7.3


good news:
devel server reinstalled with RHEL3 beta , problem gone :)

Comment 25 Hrunting Johnson 2003-07-31 09:42:27 EDT
We run 2.4.20-19.9 without swap and we basically see the same issue.  When 
memory gets to around the 60% used state (by used, I mean, total minus free 
minus cache), processes go into what looks like a spinlock craze trying to get 
memory.  This is despite 800MB of RAM (on a 2GB machine) being listed as cache 
(which to me means "available").  If swap is enabled, it just goes into swap 
hell.  Our application's activity on the system is similar to squid.

It's almost as if the kernel *must* keep about 40% of memory available for 
cache.
Comment 26 Erik Reuter 2003-08-02 12:42:02 EDT
Been working for a couple of hours... Found myself spending more and more time
waiting while switching between applications. This is what the different memory
stats are right now (running with the default vm parameters for RHL9, latest
kernel update):

free:
             total       used       free     shared    buffers     cached
Mem:       1289496    1280104       9392          0      24396     779656
-/+ buffers/cache:     476052     813444
Swap:      2040212     285496    1754716

vmstat:
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 2  0  0 285700   9324  24404 779656    3   30    46   171  186  1206  6  8 86

/proc/meminfo:
        total:    used:    free:  shared: buffers:  cached:
Mem:  1320443904 1310916608  9527296        0 24989696 987979776
Swap: 2089177088 292765696 1796411392
MemTotal:      1289496 kB
MemFree:          9304 kB
MemShared:           0 kB
Buffers:         24404 kB
Cached:         779656 kB
SwapCached:     185168 kB
Active:         968668 kB
ActiveAnon:     350752 kB
ActiveCache:    617916 kB
Inact_dirty:      5200 kB
Inact_laundry:  187108 kB
Inact_clean:     28196 kB
Inact_target:   237832 kB
HighTotal:      393200 kB
HighFree:         1276 kB
LowTotal:       896296 kB
LowFree:          8028 kB
SwapTotal:     2040212 kB
SwapFree:      1754308 kB
Comment 27 Erik Reuter 2003-09-21 08:45:21 EDT
Any news regarding this bug?
Comment 28 Rik van Riel 2003-09-21 12:08:54 EDT
Two things:

1) could you try "echo 1 5" > /proc/sys/vm/pagecache  ... to make sure the kernel
   really evicts most of the page cache before swapping

2) Davej, could you add the inode reclaim fixes into the 2.4.20-* kernels ?
Comment 29 Kyle Bateman 2003-09-25 23:27:41 EDT
I have this same problem on about 5 production servers.  I have been wrestling
with it for the last couple of months and just now finally found this bug
thread.  I'm on this version now:

Linux chi.actarg.com 2.4.20-20.9 #1 Mon Aug 18 11:45:58 EDT 2003 i686 i686 i386

And still am having problems.  I did a:

grep -r zzyzx /remote_nfs_volume/*

and watched the cache with "free".  If left to run, the nfs caching will grow
continually, swapping out about every process on the machine.  The apps become
very sluggish and the cache does not seem to release easily.  Unfortunately, I
upgraded the whole network to RH9 before understanding there was a problem so
now I'm crippled across the network.

I tried echo 1 5 > /proc/sys/vm/pagetable_cache

(I'm assuming that's what is meant in #28) but the system still seems to prefer
swapping out processes as opposed to releasing cache.

Is there a fix for this in the works for this?
Comment 30 Dave Jones 2003-09-28 21:00:11 EDT
> Davej, could you add the inode reclaim fixes into the 2.4.20-* kernels ?

Not without spending a significant amount of time untangling the various vm
related patches in that tree. Its based on an older rmap version, with various
updates (some of which may or may not be in later rmaps).

At a guess its at least a day or so work.

I don't have time to do this anytime soon, so don't hold your breath for it..
Comment 31 Kyle Bateman 2003-09-29 14:24:04 EDT
I'm holding my breath for something :)

Is there something I can do as a workaround?  For example, is there a way to
limit the amount of memory the kernel uses for caching?  That way, I could keep
the memory more available for processes.
Comment 32 vincent mulligan 2003-11-18 16:21:18 EST
Created attachment 96045 [details]
none

Is there a fix for this yet - is it an issue in AS3. We see this activity on
our servers to the point that they become unusable. Last messages on the
console show kswapd as the top process
Comment 33 Rik van Riel 2003-11-18 16:28:36 EST
I'd appreciate it if you could post a screen of top(1) and a screen of
vmstat 1 during a trouble period, so we can debug what's happening
with the RHEL3 kernel, as well as the exact version number of the
kernel you are using.
Comment 34 Christopher McCrory 2003-11-18 16:48:11 EST
> is it an issue in AS3

I think there should be a '?' in there

I am migrating to RHEL3 (from 7.3) and am NOT seeing this anymore on
2.4.21-4.ELsmp



> We see this activity on our servers to the point that they become
unusable

On RH7.3 you might try this:
/sbin/swapoff -a

(works for me)
Comment 35 Chris Petersen 2003-11-25 19:19:09 EST
Created attachment 96201 [details]
Tarball of bug-reproducing example code
Comment 36 Chris Petersen 2003-11-25 19:22:18 EST
The block device cache is causing kswapd thrashing, usually bringing
the system to a halt.

This problem has been reproduced on kernels as recent as 2.4.21-4EL.

In our application we deal with large (multi-GB) files on multi-CPU
4GB platforms (mostly 2.4.7-10).  While handling these files, the
block device cache allocates all remaining available memory (3.5G) up 
to the 4G physical limit.

Once the block device cache has pegged the physical memory limit,
it doesn't seem to manage it's allocation of that memory well enough
to prevent unnecessary page-swapping.  Ultimately, thrashing takes
over and the SYSTEM COMES TO A HALT.

After the application closes all files and exits, the cache maintains
its allocation of this memory until either: 1) the file is removed,
or 2) somebody requests more memory.  In the former case, used memory
(top, /proc/meminfo) drops instantly to the amount used by all
processes (sum of ps use).  In the latter, memory use remains pegged
and swapping typically remains a problem.  There doesn't appear to be
a timeout on the cache's allocation.

THIS IS BROKEN.

This problem is most noticable when the (cached) files causing the
problem are on a local disk.

Below is an example of a pseudo-idle system (only running 'du')
which is affected by the trashing problem.  Both CPUs are 99% system,
kswapd is 99.9%, load average exceeds 4 and growing, and virtually
all memory is consumed, although only 717,140K is reported to be
used by "all" processes (using a sum of 'ps -aux' memory use).

  5:31pm  up 53 days, 11:28, 19 users,  load average: 4.64, 3.14, 2.14
160 processes: 157 sleeping, 2 running, 0 zombie, 1 stopped
CPU0 states:  0.1% user, 99.0% system,  0.0% nice,  0.2% idle
CPU1 states:  0.1% user, 99.2% system,  0.0% nice,  0.0% idle
Mem:  3928460K av, 3828808K used,   99652K free,       0K shrd,   
26148K buff
Swap: 4194224K av,  696384K used, 3497840K free                 
2715008K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
    5 root      17   0     0    0     0 RW   99.9  0.0 218:52 kswapd

I have seen situations where the load average exceeds 12.0 (!), and
others on a 4-CPU 64-bit 6GB machine (running 2.4.21[-4.EL]) where
all four CPUs are at 100% system, and page-swapping.

THIS PROBLEM IS READILY REPRODUCIBLE.

I have a test program (fst) which can reproduce the problem; with an
additional memory reclaimation program (reclaim).  A tarball of these 
has been attached.

fst can be used to generate large files (with seek behavior typical
of our application, as seeking seems to aggrevate the problem).  When
using fst (on a 4GB system), specify 'num_blks' to be 2,000,000 to
4,000,000, with mode = 1 (seek-updating enabled):

    fst 3000000 fst.out 1

This will create a file with 3,000,000 blocks of random size between
1-2048 bytes.  Midway through creating fst.out, the block device cache
should have allocated all of memory.  If thrashing doesn't immediately
occur you can run multiple fst's to aggravate the problem.

reclaim can be used to illustrate that, with fst still running (and
pegged), it is possible to manually reclaim/free the memory used by
the block device cache, thereby eliminating the issues with kswapd,
bdflush, kupdated, etc.  But given that fst's still running, memory
usage creeps back up, as expected.

This seems to be a fairly fundamental and substantial problem.  Over
time rogue memory use by the block device cache simply creeps up and
up toward the physical limit.  And it becomes a probem more readily.

Can anyone provide a means to mitigate or eliminate this problem?
We've toyed with altering parameters to bdflush and the like, with
no succe
Comment 37 Rik van Riel 2003-11-25 21:38:18 EST
Chris, thank you for your test program.

I'll be visiting family over the next week, but once I return I'll run
it and I'll try to improve the VM's behaviour when faced with your
test program.
Comment 38 Lou 2003-11-29 17:29:31 EST
I am also seeing the same problem after I just ran up2date.  Although,
up2date is only upgrading me to the following (others of you are
higher maybe because you're running EE, else do I need to do manual?):

Linux localhost.localdomain 2.4.20-8 #1 Thu Mar 13 17:18:24 EST 2003
i686 athlon i386 GNU/Linux

I am running RH9 Workstation with 768Mb RAM.  Very frustrating because
my system is so slow now when I top off the RAM.

Comment 39 Jim Laverty 2004-01-27 17:01:23 EST
Could this be related?

http://marc.theaimsgroup.com/?l=linux-kernel&m=107368165419559&w=2
Comment 40 Chris Petersen 2004-01-30 16:48:32 EST
This problem of has been shown to be eliminated in (at least)
RedHat's 2.4.20-24.7 or later (available as an RPM from 
updates.redhat.com); and in (at least) 2.4.23 from kernel.org.
Comment 41 Erik Reuter 2004-01-31 04:38:16 EST
I've upgraded to Fedora and been running with this for a while. I
haven't noticed the problem for some time now. I'm solely using Fedora
as a desktop environment (as I did RH9) for my development tasks,
which is where I first stumpled upon the issue.

My kernel is 2.4.22-1.2149.nptl

So, for me it's either solved or reduced beyond the point of notice.
Comment 42 Jim Laverty 2004-02-03 11:35:51 EST
I have been able to reproduce this on a fresh install of Red Hat 9
using the latest Red Hat release of the 2.4.20-28.9smp kernel.  If I
blast NFS reads/writes to a single NFS mount point, I can reproduce
this in under 3 minutes on a Dell 1750 with 4GB RAM and a 2GB swap
partition.  

I am getting ready to try this with the stock 2.4.24 kernel and
Andrea's 2.4.23aa1 patch.
Comment 43 Jim Laverty 2004-02-12 13:55:22 EST
I have been able to reproduce this issue with the 2.4.24smp kernel on
Red Hat 9 on a Dell 450 workstation, with different behavior than I
experienced on the 2.4.20-xx kernels.  To sum it up, 2.4.24smp behaves
much better relative to this caching issue.  I was not able to get
this behavior on the 2.6.2smp kernel, using an identically configured
Dell 450.

I have been able to bring the 2.4.20-xx kernels to their knees in less
than 15 minutes, using the same scenarios below.  The results listed
below are based on 24 hours of testing and the system is still running
well.  The swap space consumption does still grow over time, however
not at the previous rates.  The amount of swap space consumed is
dramatically lower than the amount consumed in the past 2.4.20-xx
kernels using these same tests.

I loaded X and 3 bash sessions, plus the normal run of the mill
daemons running which are mostly idle.  I hacked together a little C
code to generate a large file (100GB) over NFS, in 512 byte increments
sequentially vs. using dd.  This caused the system to start consuming
swap space, however it does take much longer to reach this state using
2.4.24 than the 2.4.20-xx kernels.

Here is a clip of top from the first scenario:

  11:36:57  up 22:51,  6 users,  load average: 0.45, 0.53, 0.51
93 processes: 92 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states:   0.2% user   6.2% system    0.0% nice   0.0% iowait 
93.0% idle
CPU1 states:   0.0% user   1.1% system    0.0% nice   0.0% iowait 
98.3% idle
CPU2 states:   5.3% user  25.0% system    0.0% nice   0.0% iowait 
69.0% idle
CPU3 states:   0.0% user   0.1% system    0.0% nice   0.0% iowait 
99.4% idle
Mem:  2069312k av, 2019632k used,   49680k free,       0k shrd,  
49292k buff
       108740k active,            1781024k inactive
Swap: 2096440k av,    1104k used, 2095336k free                
1771100k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 6766 jim     17   0   576  576   504 S    30.9  0.0   5:18   2
cacheSmasher
 1052 root       9   0     0    0     0 SW    2.9  0.0   2:56   0 rpciod
    7 root       9   0     0    0     0 SW    1.1  0.0   0:28   1 kswapd
 3540 root       9   0  1136 1136   876 S     0.3  0.0   3:51   0 top
 3022 root       9  -1  140M  12M  4180 S <   0.1  0.6   2:33   1 X
 6933 root      10   0  1128 1128   876 R     0.1  0.0   0:02   0 top
    1 root       9   0   464  464   416 S     0.0  0.0   0:04   2 init
    2 root       9   0     0    0     0 SW    0.0  0.0   0:00   0 keventd
    3 root      19  19     0    0     0 SWN   0.0  0.0   0:00   0
ksoftirqd_CPU0
    4 root      18  19     0    0     0 SWN   0.0  0.0   0:00   1
ksoftirqd_CPU1
    5 root      19  19     0    0     0 SWN   0.0  0.0   0:00   2
ksoftirqd_CPU2
    6 root      18  19     0    0     0 SWN   0.0  0.0   0:00   3
ksoftirqd_CPU3
    8 root       9   0     0    0     0 SW    0.0  0.0   0:00   2 bdflush
    9 root       9   0     0    0     0 SW    0.0  0.0   0:01   2 

Next I loaded Mozilla and the swap space increased by 700k, which
actually is not so bad.  However I do not have enough apps and daemons
loaded to consume even half of the 2GB of RAM in the system.

 12:54:35  up 1 day, 9 min,  6 users,  load average: 0.49, 0.75, 0.80
99 processes: 97 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states:   3.1% user  14.4% system    0.0% nice   0.0% iowait 
81.3% idle
CPU1 states:   3.3% user  16.3% system    0.0% nice   0.0% iowait 
79.2% idle
CPU2 states:   0.0% user   4.1% system    0.0% nice   0.0% iowait 
95.3% idle
CPU3 states:   0.2% user   0.2% system    0.0% nice   0.0% iowait 
99.1% idle
Mem:  2069312k av, 2019268k used,   50044k free,       0k shrd,  
37384k buff
       113240k active,            1827660k inactive
Swap: 2096440k av,    1832k used, 2094608k free                
1790920k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 6766 jim       16   0   576  576   504 R    31.8  0.0  31:09   0
cacheSmasher
 1052 root       9   0     0    0     0 SW    2.7  0.0   6:16   3 rpciod
 3022 root      12  -1  147M  18M  4600 S <   1.7  0.9   3:20   3 X
 6939 jim       9   0 52020  50M 14984 S     1.1  2.5   0:24   1
mozilla-bin
 3130 root       9   0 21712  21M 19288 S     0.9  1.0   2:24   0 kdeinit
 3110 root       9   0 16888  16M 15600 S     0.5  0.8   5:30   0 kdeinit
 3171 root      12   0 19600  19M 17652 S     0.5  0.9   1:20   0 kdeinit

Third scenario I loaded some apps which log locally and consume
between 20-90MBs of RAM, then slowly grow over time.  The swap space
moved only up to around 3200k, in the pervious kernel it would have
spiked very fast.  Even though this is running much better and
consuming very little swap space, I would not expect anything to be in
swap with this much memory available and so little actual memory being
consumed by apps.  

 13:33:42  up 1 day, 48 min, 15 users,  load average: 2.17, 1.90, 1.42
193 processes: 190 sleeping, 3 running, 0 zombie, 0 stopped
CPU0 states:  18.4% user  18.1% system   13.0% nice   0.0% iowait 
63.4% idle
CPU1 states:  17.1% user  15.4% system   13.0% nice   0.0% iowait 
66.3% idle
CPU2 states:  18.2% user   8.3% system   14.4% nice   0.0% iowait 
72.5% idle
CPU3 states:  11.1% user   9.4% system    8.0% nice   0.0% iowait 
78.4% idle
Mem:  2069312k av, 2018996k used,   50316k free,       0k shrd,  
37768k buff
       114676k active,            1853104k inactive
Swap: 2096440k av,    3240k used, 2093200k free                
1458848k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 7662 jim      19   9 83552  16M 10344 R N  50.7  0.8   6:16   2 iomon
 6766 root      13   0   576  576   504 R    45.6  0.0  46:45   2
cacheSmasher
 1052 root       9   0     0    0     0 SW    4.5  0.0   8:20   3 rpciod
 7543 jim       9   0 69444  67M  1812 S     3.1  3.3   1:09   3
logandgrow
 3540 root       9   0  1308 1308   960 S     2.5  0.0   4:57   0 top
 6933 root      11   0  1312 1312   960 R     2.5  0.0   1:08   1 top
 7535 jim       9   0 12920  12M  1028 S     2.5  0.6   1:24   2
logandgrow
 7545 jim       9   0 86640  84M  1812 S     1.5  4.1   1:16   0
logandgrow
 7570 jim       9   0 78692  76M  1812 S     1.5  3.8   0:33   3
logandgrow


  Now I will try to throttle the system over the holiday weekend, to
see how stable it is with high usage over a long period of time.  If I
produce results which are contrary to these results, I will post them
next week.
Comment 44 Jim Laverty 2004-02-12 16:26:28 EST
Additional note:

The swap space used just reached 348,452k, so I terminated the heavy
NFS I/O.  Unlike 2.4.20-xx, cache now is being freed for my apps as
they grow though the swap space is still slowly growing.



Comment 45 Jim Laverty 2004-02-13 11:15:44 EST
I can confirm that the swap space does free up slowly over time using
the 2.4.24smp kernel, as does the cache space.   The system is
currently maintaining 40-50MBs of free memory, where in the past it
average between 4-10MB free.  To sum it up again, the caching seems
much better at this point.
Comment 46 martynas 2004-03-05 13:39:54 EST
I tried updated kernel from redhat  2.4.20-30.9smp for my redhat 9, 
but it eats all ram for caching :( So, the probles exists and with 
updated kernel. Or I am wrong?
So could anybody say, how to fix this problem, or when redhat fix 
this problem ? I use 5 redhat 9 systems as servers..

Comment 47 Chris Petersen 2004-03-05 14:42:52 EST
The kernel(s) "eat all ram for caching" by design.

The issue described here has been that in older (than 2.4.20-20-ish 
RedHat) kernels actually have difficultly managing low-mem situation 
such that kswapd et al get a lot of time (even though it doesn't 
actually swap pages in the end).

Later kernels still give all free memory to the block device cache 
(why shouldn't it?), don't have weird swapping issues, and the BDC 
gives memory back when needed.
Comment 48 Jim Laverty 2004-03-05 14:50:10 EST
I would recommend rolling your own 2.4.24 kernel (kernel.org) or
newer, the caching in it works fine.  I haven't seen any indication of
a back patch being planned for the 2.4.20 series kernels.
Comment 49 martynas 2004-03-05 15:58:09 EST
So why redhat dont build > 2.4.20 kernel rpm for redhat 9? Could 
somebody explain me this?:)

Comment 50 Gary Mansell 2004-03-17 12:10:43 EST
I have come across this same problem but even if I set the following
in vm.pagecache to 2 10 20, I still get more than 20% of memory used
as cache - 

Here is /proc/meminfo -

[grma@shane 59] ~ > cat /proc/meminfo        total:    used:    free:
 shared: buffers:  cached:Mem:  525836288 445038592 80797696        0
31404032 187092992Swap: 2146787328 30904320 2115883008
MemTotal:       513512 kB
MemFree:         78904 kB
MemShared:           0 kB
Buffers:         30668 kB
Cached:         181156 kB
SwapCached:       1552 kB
Active:         347540 kB
ActiveAnon:     204324 kB
ActiveCache:    143216 kB
Inact_dirty:     32452 kB
Inact_laundry:   27108 kB
Inact_clean:      5700 kB
Inact_target:    82560 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       513512 kB
LowFree:         78904 kB
SwapTotal:     2096472 kB
SwapFree:      2066292 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     4096 kB

My system seems to grind to a halt due to their being no memory
available to the applications because it is all being used for cache.

Any ideas what I can do to prevent this?

Comment 51 Gary Mansell 2004-03-18 11:38:47 EST
Can someone please explain why Redhat seem to be doing nothing about a
MAJOR issue with their RHEL 3 flagship product.

I have been doing some research into this and it is definitely a
problem that people are coming across.

It appears that there is no work around so where is the updated kernel
- this problem has been around for a long time now. Where is the
support that we pay £000's for??
Comment 52 Rik van Riel 2004-03-18 12:56:40 EST
OK, apparently it is still an issue in RHEL3.

Larry, this bug may have some useful info for you...
Comment 53 martynas 2004-03-19 17:32:58 EST
I think, all redhat 9 users are waiting for fix of this bug too (not 
only RHEL).
If the latest 2.4.x kernels are without this problem, so Redhat team 
should build new kernel rpm.

 
Comment 54 Loren Siebert 2004-04-28 02:12:17 EDT
I certainly still see this behavior on RHEL3.0 with 2.4.21-9.0.3.ELsmp. Same data as 
everybody else has been posting for over a year now on this thread, so I won't bother. For 
me, it's mysql4.0.x that triggers the swapping, but RH doesn't support mysql4, so my 
trouble ticket got rejected when I complained about it. But now, after months of searching 
around, I see this thread, and realize that it's the kernel, not mysql, that has the trouble. 
Redhat, this is not a problem that only a few people see. Please, can you recommend a 
workaround until RHEL3 has a kernel with better memory management?
 
Comment 55 Marc-Christian Petersen 2004-04-29 09:45:00 EDT
Hi Loren,

ok, Redhat won't hear this but you have 2 possible choices:

1. Use an -aa VM enabled kernel (e.g. mainline 2.4)
2. Use Kernel 2.6.6*

where (1) isn't a recommended workaround at all :)

Anyway, I tried to fix this bug in the past days w/o much success yet
but I won't give up. Any hints, comments or suggestions really
appreciated.

ciao, Marc
Comment 56 Loren Siebert 2004-04-30 19:51:31 EDT
I turned off swap on one of my machines and it's running fine so far, without the 
occasional delay's I'd usually see when si/so would kick in. Today's slashdot carries a story 
on this issue: http://developers.slashdot.org/article.pl?sid=04/04/30/
1238250&mode=thread&tid=106&tid=185. I can do this because I have 4GB of RAM and I 
know I am not going to use more than 3.2GB of it.

From what I gather in user comments, 2.6.6 helps by offering a "swappines" param that 
you can set to zero (or really low) to encourage use of cache versus paging to disk. But for 
2.4.X, many folks report success with "swapoff -a". 

Comment 57 Marc-Christian Petersen 2004-05-03 09:35:22 EDT
Hi Loren,

well, "swapoff -a" isn't a solution, it's a workaround at all and in
question whether a good one or not. Either we should make
/proc/sys/vm/pagecache to behave correctly or introduce something like
/proc/sys/vm/swappiness which works correctly ;) and *imho* the
default of "1 15 100" of pagecache is wrong at all. I had good
experience with "1 10 10" or even lower values. Even a _real_ working
drop behind (on/off via sysctl) would make some sense.

ciao, Marc
Comment 58 Arns H 2004-05-05 08:11:06 EDT
can someone explain wich kernel we have to use on RH9 to fix this
problem ?
Comment 59 Rik van Riel 2004-05-05 08:56:53 EDT
RHL9 is EOL, but it should be possible to run the Taroon (RHEL3)
kernel with it. The .src.rpms for that kernel are available from
ftp.redhat.com.
Comment 60 Arjan van de Ven 2004-05-05 08:59:07 EDT
and the binary from
ftp://ftp.redhat.de/pub/SAP/RHEL3/certified/kernel-2.4.21-9.0.1.EL
Comment 61 Marc-Christian Petersen 2004-05-05 09:07:49 EDT
... and that kernel has the same problems as mentioned above.

Anyway, for all you experiencing the above problems, try setting
pagecache to 1 10 10 (echo 1 10 10 >/proc/sys/vm/pagecache) and it
will work at least better than before.

ciao, Marc
Comment 63 Arns H 2004-05-11 07:37:30 EDT
there is no /proc/sys/vm/pagecache file. Did the filename changed ?

[root@synstd2 vm]# ll
total 0
-rw-r--r--    1 root     root            0 mai 11 11:19 bdflush
-rw-r--r--    1 root     root            0 mai 11 11:19 kswapd
-rw-r--r--    1 root     root            0 mai 11 11:19 max_map_count
-rw-r--r--    1 root     root            0 mai 11 11:19 max-readahead
-rw-r--r--    1 root     root            0 mai 11 11:19 min-readahead
-rw-r--r--    1 root     root            0 mai 11 11:19 overcommit_memory
-rw-r--r--    1 root     root            0 mai 11 11:19 page-cluster
-rw-r--r--    1 root     root            0 mai 11 11:19 pagetable_cache


kernel : 2.4.20-31.9
Comment 64 Marc-Christian Petersen 2004-05-21 07:41:07 EDT
Hi Arns,

well, either that kernel does not have RMAP (unlikely ;) or the
tunable is not yet added. It was added in rmap 15c. You can check
whether you have rmap or not by "ls -lsa mm/rmap.c" in your kernel
source directory. Maybe there's an update for your redhat to get a
newer kernel?! Dunno.

ciao, Marc
Comment 65 David Johnston 2004-05-21 12:23:41 EDT
Arns is right; I'm running Red Hat's kernel
(kernel-2.4.20-31.9.i686.rpm, via up2date just before the
end-of-life).  I installed their source rpm, too, and 
/usr/src/linux-2.4/mm/rmap.c exists.  /proc/sys/vm/pagecache does not.

Additionally, this bug is marked as an Athlon bug.  I get the same
results on an Intel PIII.
Comment 66 Bruce A. Locke 2004-05-21 12:38:43 EDT
This bug is now pretty much worthless.  It has become a discussion
forum with a mismash of several different issues.

Some points:

- Red Hat 9 is _DEAD_.  No one with the power to fix your "issues"
cares about it anymore.

- RHEL 3, while based off of RH 9, is not the same and there are
patches in the RHEL 3 kernel tree that were never in RH 9 (from what I
see)

- The VM (rmap patches, etc) has changed over time so earlier issues
mentioned in this bug have been fixed for some people and behaviours
may have changed for others.

If you are having problems with the RH 9 VM the fix is simple...  Stop
using RH 9.  Fedora Core 1's kernel "fixed" all the issues I had with
 squid (mentioned above). 

If you are using RHEL 3 and you are seeing VM issues then file a
seperate bug report with detailed information.  Chances are it is a
completely different bug that has similar symptoms based on your workload!

I'd strongly suggest closing this bug as it is pure noise.

</rant>
Comment 67 Arns H 2004-05-24 11:54:55 EDT
bruce,

I got 6 RH9 stations dispatched around the world and i'm about to
dispatch between 150 and 200 Linux stations in the next months...so i
will follow your advice :"...the fix is simple...  Stop using RH 9.." .

We gonna switch to *Something else than redhat* .

No more noise.

Arns.
Comment 68 John Bass 2004-05-24 12:19:32 EDT
Bruce, the problem is better, but certainly not "FIXED" in FC1 which
I have been running on since it came out. The system still agressively
tends to purge active process memory as page outs everytime there is
a burst of filesystem I/O. This is fundamentally wrong from two views:

1) Large semi-idle processes like X, kdeinit, and other interactive
processes get their memory stolen, and have to page back in under
degraded I/O conditions to respond to key strokes with high latency.

2) It requires two I/O's to page, and a single I/O to recover filesystem
cache data ... paging out once there is filesystem I/O activity is a
pure mistake, as it creates additional disk I/O rather than saving it.

While the fixes improve this to some degree, the problem is certainly
NOT fixed ... notice the burst of I/O invoking page outs below on
a relatively idle FC1 system with significant memory:
[jbass@fastbox jbass]$ vmstat 5
procs                      memory      swap          io     system   
     cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy wa id
 0  0  58780  24764 124636  55048    0    0     4     6   13     8  3
 2  0 29
 0  0  58780  24752 124636  55048    0    0     0     8  101   225  2
 1  0 97
 0  0  58780  24744 124640  55048    0    0     0     2  129   296  2
 1  0 97
 0  0  58780  24740 124644  55048    0    0     0     2  101   224  1
 1  0 98
 0  0  58780  24736 124648  55048    0    0     0     9  108   241  2
 1  0 98
 0  0  58780  24724 124656  55052    0    0     1    16  163   435  5
 1  0 93
 0  0  58780  24728 124656  55052    0    0     0     2  108   328  3
 1  0 96
 3  0  58780  24456 124672  55200    0    0    32     2  107   251  2
 0  0 98
 1  0  58780  23972 125088  55200    0    0    79    14  377  1013 67
17  0 16
 2  0  58780  23452 125520  55200    0    0    79     9  311   581 51
22  0 27
 1  0  58780  22104 126556  55228    0    0   202    23  530  1015 55
 6  0 40
 2  0  58780  21908 126712  55228    0    0    23     4  148   352 54
 9  0 37
 2  0  58780  21840 126744  55228    6    0     9     4  150   353 47
43  0 10
 2  0  58780  21576 126764  55228    0    0     1     4  110   243 37
59  0  4
 2  0  58780  16612 126776  55228    0    0     0     4  118   269 54
35  0 10
 2  0  58780  13864 126796  55228    0    0     0    10  102   252 28
70  0  2
 2  0  58780  10216 126816  55228    0    0     2     8  107   232 31
68  0  2
 2  0  58780   4256 125248  55164    0    0     0     4  103   270 45
50  0  5
 2  0  58780   4232 124288  55164    0    0     0     2  203   502 13
85  0  2
 1  0  58780   4192 121352  55164    6    0     9    10  115   403 21
73  0  6
 0  1  58800   9504 119180  47524    0   10   214    31  139   446 49
11  0 40
 1  0  58800   4816 119196  46672    0    0   254   198  168   410 50
 7  0 44
 1  0  58896   9344 119740  46632    0    0   142   146  201   469 58
16  0 26
 2  0  58896   7904 120160  46632    0    0    80   107  158   248 61
 3  0 36
 1  1  58896   4636 119848  46620    2    0   301    70  216   553 56
10  0 35
 0  1  58932  10524 119444  46592    0   22   460   174  236   706 25
12  0 63
 1  1  58932   8288 121640  46592    0    0   436    95  212   720 18
 6  0 76
 0  1  58932   6176 123692  46592    0    0   407    98  250   789 17
 8  0 75
 0  1  58932   4664 125196  46592    0    0   297   309  220   578 16
 5  0 79
 2  0  58932   3872 126344  46284    0   31   384   262  217   618 17
 5  0 78
 2  1  59040  10376 127944  46404    0    0   438   199  259   759 21
 7  0 72
 1  0  59040   8668 129652  46404    0    0   338   135  189   575 32
 6  0 62
 1  0  59040   7364 129876  46768    0    0   110   370  253   716 64
 7  0 29
 1  0  59144  22512 129152  46516    0    4   102   305  225   503 48
10  0 42
 1  0  59144  21952 129672  46516    0    0    91   594  162   376 52
 5  0 43
 1  0  59144  21444 130116  46516    0    0    82   393  138   323 52
 7  0 41

[jbass@fastbox jbass]$ free
             total       used       free     shared    buffers     cached
Mem:        384472     375608       8864          0     105608      46036
-/+ buffers/cache:     223964     160508
Swap:       923696      60096     863600

Comment 69 Myroslav Opyr 2004-06-30 09:00:29 EDT
There is "The Fedora Legacy Project" http://www.fedoralegacy.org/ with
bugzilla for issues like this. 

I've opened bugzilla issue there: 
http://bugzilla.fedora.us/show_bug.cgi?id=1797

Probably it'll be the place to continue bug discussion. I'd ask those
who had successful results (namely Jim Laverty, Christopher McCrory,
Chris Petersen, Erik Reuter, and Marc-Christian Petersen) with 2.4.24
and 2.6.6 kernels to put their comments in "Fedora Legacy" bug that
would help package proper kernels for RH9.
Comment 70 Jim Laverty 2004-07-23 09:18:52 EDT
I just switched jobs, so I will add my comments in the next week or
so.    I'm in the process of updating my e-mail address everywhere.

I have Fedora Core 2 w/2.6.7 running here, so I will post results
based on that also.
Comment 71 Marc-Christian Petersen 2004-07-28 04:27:18 EDT
Hi Jim,

2.6.* results aren't interesting and not related to this bug report.
Anyway, I've found out how to fix 2.4-rmap silly behaviour in swapping
everything out like hell. It's basically a 3 line change. I'll cook up
a patch for latest RHEL3 kernel with a /proc value to turn that
feature on/off.

ciao, Marc
Comment 72 Myroslav Opyr 2004-07-28 04:57:21 EDT
Hi Marc-Christian,

Would you be so kind to explain things for me as RH9 (2.4.20-31.9)
prisoner? Would your atch be useful for that older kernel as well?

I've opened the bug at FL bugzilla:
https://bugzilla.fedora.us/show_bug.cgi?id=1797

Thanks,

m.
Comment 73 Jim Laverty 2004-07-28 15:51:09 EDT
Marc,

I cross posted between the RH 9 and FC1 instances of this bug (issue).

A patch sounds good and very useful for the masses, nice work.

Jim

Comment 74 Gary Mansell 2004-08-25 04:08:59 EDT
Am I correct in thinking that someone might have a patch for this problem?

If so, could it be posted ASAP as I have machines that I am going to
have to rebuild as SUSE machines unless they get fixed quickly.

Comment 75 Marc-Christian Petersen 2004-08-25 05:23:16 EDT
Created attachment 103053 [details]
Fix braindead swapping
Comment 76 Marc-Christian Petersen 2004-08-25 05:24:48 EDT
Created attachment 103054 [details]
Fix braindead swapping
Comment 77 Marc-Christian Petersen 2004-08-25 05:25:34 EDT
ARGS, I thought I already did it but I was wrong :-( Sorry.

Well, I don't care if you use Redhat or SuSE (I use Debian ;) but here
we go. I've attached some patches (01_vm-anon-lru.patch is the one
which fixes braindead swapping) but there is alot more: Updated VM
documentation (every knob is documentated in
Documentation/sysctl/vm.txt), VM tweaks in /proc/sys/vm, bonus for
desktop users to get a non-sluggish desktop behaviour: O(1) "desktop"
boot parameter which changes max-timeslice, min-timeslice and
child-penalty (also changeable at runtime via /proc/sys/kernel/sched*.
Also, vm.pagecache now defaults to 1 5 10 (1 15 100 is silly).

These patches have to be applied in numbering order against a
2.4.21-15-0.3.EL kernel (maybe they'll apply to something different
also, dunno).

These patches fixes all of the problems reported here for _ME_ and
_my_ customers. That's all I cared about. If it fixes your problems
and others as well, I'm glad :-)

P.S.: Yes, the vm knobs are taken from 2.4-AA.

ciao, Marc
Comment 78 Marc-Christian Petersen 2004-08-25 05:28:34 EDT
Created attachment 103055 [details]
02 - vm.vm_cache_scan_ratio
Comment 79 Marc-Christian Petersen 2004-08-25 05:30:41 EDT
Created attachment 103056 [details]
03 - vm.vm_passes
Comment 80 Marc-Christian Petersen 2004-08-25 05:32:53 EDT
Created attachment 103057 [details]
04 - vm.vm_gfp_debug
Comment 81 Marc-Christian Petersen 2004-08-25 05:34:35 EDT
Created attachment 103058 [details]
05 - vm.vm_vfs_scan_ratio
Comment 82 Marc-Christian Petersen 2004-08-25 05:36:38 EDT
Created attachment 103059 [details]
06 - Remove old and obsolete VM documentation
Comment 83 Marc-Christian Petersen 2004-08-25 05:38:30 EDT
Created attachment 103061 [details]
07 - Update VM docu to Documentation/sysctl/vm.txt
Comment 84 Marc-Christian Petersen 2004-08-25 05:41:05 EDT
Created attachment 103062 [details]
08 - just reorder 1 variable in mm/vmscan.c
Comment 85 Marc-Christian Petersen 2004-08-25 05:42:45 EDT
Created attachment 103063 [details]
09 - vm.pagecache - Change '1 15 100' to '1 5 10'
Comment 86 Marc-Christian Petersen 2004-08-25 05:44:53 EDT
Created attachment 103064 [details]
10 - O(1) scheduler: Introduce sysctl knobs for max-timeslice, min-timeslice and child-penalty (Part 1)
Comment 87 Marc-Christian Petersen 2004-08-25 05:47:32 EDT
Created attachment 103065 [details]
O(1) scheduler: Introduce 'desktop' boot parameter (lowered max-timeslice) (Part 2)
Comment 88 Marc-Christian Petersen 2004-08-25 05:51:36 EDT
Okay, 11 patches are up. That's all. Without vm.vm_anon_lru, this
machine, which is up now:

root@christian:[/] # w
 11:52:34 up 15 days, 23:43, 18 users,  load average: 0.26, 0.18, 0.17

used to go in swap after 1-2 days using almost all of swap available.
Now, take a look yourself :p (NOTE: 2.4-WOLK /proc/meminfo output, not
2.4-REDHAT, not 2.6*)

        total:    used:    free:  shared: buffers:  cached:
Mem:  527556608 516202496 11354112        0 118779904 183275520
Swap: 139788288    16384 139771904
MemTotal:        515192 kB
MemFree:          11088 kB
MemUsed:         504104 kB
Buffers:         115996 kB
Cached:          178964 kB
SwapCached:          16 kB
Active:          236772 kB
ActiveAnon:        1820 kB
ActiveCache:     234952 kB
Inactive:         58480 kB
Inact_dirty:      43660 kB
Inact_laundry:     7736 kB
Inact_clean:       7084 kB
Inact_target:     59048 kB
HighTotal:            0 kB
HighFree:             0 kB
LowTotal:        515192 kB
LowFree:          11088 kB
SwapTotal:       136512 kB
SwapFree:        136496 kB
SwapUsed:            16 kB
VmallocTotal:    516028 kB
VmallocUsed:      22192 kB
VmallocChunk:    493836 kB

Have fun. We had it =)

ciao, Marc
Comment 89 Bugzilla owner 2004-09-30 11:40:49 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.