Bug 188141 - Kernel appears too conservative in memory use
Kernel appears too conservative in memory use
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
:
: 193696 (view as bug list)
Depends On:
Blocks: 181409
  Show dependency treegraph
 
Reported: 2006-04-06 08:24 EDT by Steve Snyder
Modified: 2007-11-30 17:07 EST (History)
5 users (show)

See Also:
Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-08-10 19:05:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
remove this patch (2.42 KB, patch)
2006-04-06 22:10 EDT, Jason Baron
no flags Details | Diff

  None (edit)
Description Steve Snyder 2006-04-06 08:24:29 EDT
Description of problem:

The kernel appears to be too conservative in its use of system memory.  It will
write pages to swap even with plenty of unused RAM available.

Version-Release number of selected component (if applicable):

kernel-smp-2.6.9-34.EL

How reproducible:

Always

Steps to Reproduce:
1. Install kernel-smp-2.6.9-34.EL on fully-updated RHEL4/U3 system
2. Run system for several days
3. Note increasing use of swap even with what appears to be usused system RAM
  
Actual results:

Increasing use of swap even with lots of unused memory

Expected results:

Program pages should be kep in system RAM unless there is pressure from other
programs or the disk cache.

Additional info:

This probably seems on its face to be another case of a newbie who doesn't
understand that memory not used by running programs is used for disk caching. 
It's not.  I have plenty of RAM on this system, and most of it remains free all
the time, not even used for disk caching

For example:

# uptime
 08:05:19 up 11 days, 23:53,  1 user,  load average: 0.00, 0.00, 0.00
# free
             total       used       free     shared    buffers     cached
Mem:       1035468     239864     795604          0       8000      79308
-/+ buffers/cache:     152556     882912
Swap:      1767128     120620    1646508

The increase in swap use is not gradual.  The swap use tends to jump when the
nightly cron jobs are run, probably the result of slocate and virus checking
causing the use of more RAM for the disk cache.

Maybe there isn't a kernel problem, just a problem in the display of the
statistics.  But "free" usually shows the system with 1024MB of RAM as having
~800MB unused.

More info:

# cat /proc/meminfo
MemTotal:      1035468 kB
MemFree:        800832 kB
Buffers:          7112 kB
Cached:          74948 kB
SwapCached:      10420 kB
Active:         155468 kB
Inactive:        53996 kB
HighTotal:      131008 kB
HighFree:          672 kB
LowTotal:       904460 kB
LowFree:        800160 kB
SwapTotal:     1767128 kB
SwapFree:      1646424 kB
Dirty:              16 kB
Writeback:           0 kB
Mapped:         139724 kB
Slab:            16752 kB
Committed_AS:   363688 kB
PageTables:       1908 kB
VmallocTotal:   106488 kB
VmallocUsed:      2552 kB
VmallocChunk:   103584 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     4096 kB

That "LowFree: 800160 kB" suggests that it is not "free" that is erroneously
displaying the state of the memory.  It appears that ~120MB worth of pages of
program data has been flushed to swap even while plenty of unused RAM remains.

And finally, this:

# cat /proc/sys/vm/swappiness
60

Thank you.
Comment 1 Jason Baron 2006-04-06 13:34:10 EDT
what architecture are you running on?
Comment 2 Steve Snyder 2006-04-06 13:49:51 EDT
i686  Specifically dual Pentium3 processors, running the
kernel-smp-2.6.9-34.EL.i686 kernel.

Comment 3 Jason Baron 2006-04-06 13:58:29 EDT
ok. thanks. if possible can you try the test kernel at:
http://people.redhat.com/~jbaron/bz185110/ 

and report back?
Comment 4 Steve Snyder 2006-04-06 14:09:58 EDT
Err, isn't this the wrong architecture?  I don't think my 32-bit CPUs will be
happy with a 64-bit kernel.

Also, how much risk am I taking with this kernel?  The machine that I am seeing
the problem on is in production and stability is important.  Still, I do want to
get this problem resolved since it appears that the bulk of my system RAM is
being kept usused.  Can I get a relative risk estimate?  Thanks.

Comment 5 Jason Baron 2006-04-06 14:15:07 EDT
sorry about that. That is the wrong kernel. I'll build you the correct one. What
i'll do is take the kernel you're currently running -34 and change the 1 patch
that i believe is causing this problem. While i can't say that there is no risk,
i have been maintaining the rhel kernels for quite some time and would say it is
very low risk and will likely resolve this issue for you. will update soon. thanks.
Comment 6 Steve Snyder 2006-04-06 15:01:42 EDT
Is this problem SMP-specific?  I've got another Pentium3 machine running RHEL/U3
that is less mission critical.  If a uni-processor system will demonstrate the
problem and fix I'll use that instead of the SMP machine.

Also, I am comfortable patching the kernel.  If the tentative fix can be
provided in the form of a patch file I'm willing to do my own build.

Thanks.
Comment 7 Jason Baron 2006-04-06 22:10:01 EDT
Created attachment 127441 [details]
remove this patch

Not sure if this is smp specific....i wouldn't think so. What i wanted to try
was reverting a patch we already have in the kernel. ie apply it as patch -p1
-R
Comment 8 Steve Snyder 2006-04-08 16:39:33 EDT
Status: I've rebuilt the kernel after applying the above patch.  No errors or 
new warnings seen at boot time with the rebuilt kernel.  (I'm running the 
changed kernel on the SMP machine that I reported the problem on, just to 
minimize the variables for this test.)  I'll get back you in about a week with 
the results of the modification.
Comment 9 Jason Baron 2006-04-09 17:43:46 EDT
ok. thanks for keeping us posted.
Comment 10 Steve Snyder 2006-04-10 08:04:32 EDT
I guess I don't need a week to report back.  After 2 days of uptime, I am seeing
the kind of behavior I expect, certainly the sort of behavior I'm used to seeing
in the Fedora Core kernels.

Note below how there is only ~11MB of free system memory rather than the ~800MB
I've been seeing with the RHEL4 kernels.  (Again, this is with 1024MB of RAM
installed.)  It appears that the system memory is almost entirely used by
running programs and disk cache.

# uptime
 08:00:55 up 1 day, 16:09,  1 user,  load average: 0.00, 0.00, 0.00

# free
             total       used       free     shared    buffers     cached
Mem:       1035500    1023604      11896          0     216236     382436
-/+ buffers/cache:     424932     610568
Swap:      1767128          0    1767128

# cat /proc/meminfo
MemTotal:      1035500 kB
MemFree:         11704 kB
Buffers:        216372 kB
Cached:         382436 kB
SwapCached:          0 kB
Active:         199108 kB
Inactive:       490848 kB
HighTotal:      131008 kB
HighFree:          252 kB
LowTotal:       904492 kB
LowFree:         11452 kB
SwapTotal:     1767128 kB
SwapFree:      1767128 kB
Dirty:             236 kB
Writeback:           0 kB
Mapped:         107148 kB
Slab:           325608 kB
Committed_AS:   214284 kB
PageTables:       1732 kB
VmallocTotal:   106488 kB
VmallocUsed:      2544 kB
VmallocChunk:   103764 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     4096 kB
Comment 11 Steve Snyder 2006-04-14 07:54:19 EDT
A further report: after 5 days it still looks good.  Backing out that patch 
seems to have fixed the problem I was seeing.  Thanks.

# uptime
 07:56:51 up 5 days, 16:05,  1 user,  load average: 0.00, 0.00, 0.00

# free
             total       used       free     shared    buffers     cached
Mem:       1035500    1011840      23660          0     196300     349672
-/+ buffers/cache:     465868     569632
Swap:      1767128         88    1767040

# cat /proc/meminfo
MemTotal:      1035500 kB
MemFree:         23660 kB
Buffers:        196308 kB
Cached:         349664 kB
SwapCached:          0 kB
Active:         254348 kB
Inactive:       421232 kB
HighTotal:      131008 kB
HighFree:          252 kB
LowTotal:       904492 kB
LowFree:         23408 kB
SwapTotal:     1767128 kB
SwapFree:      1767040 kB
Dirty:              28 kB
Writeback:           0 kB
Mapped:         145716 kB
Slab:           327944 kB
Committed_AS:   251708 kB
PageTables:       1792 kB
VmallocTotal:   106488 kB
VmallocUsed:      2568 kB
VmallocChunk:   103764 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     4096 kB
Comment 12 Jason Baron 2006-04-20 10:57:19 EDT
committed in stream U4 build 34.20. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 13 Steve Snyder 2006-04-20 11:31:34 EDT
Will this fix be incorporated into future U3 releases, or will it not be part of
the standrd RHEL4 kernel until U4?
Comment 14 Jason Baron 2006-04-20 11:34:50 EDT
wouldn't be included until U4
Comment 17 Jason Baron 2006-06-01 13:37:03 EDT
*** Bug 193696 has been marked as a duplicate of this bug. ***
Comment 18 Mike Gahagan 2006-07-17 16:12:37 EDT
Verified.... the following reproduces the problem fairly easily on a 1GB system:

dd if=/dev/zero of=/dev/null conv=swab bs=500M count=1
/etc/cron.daily/slocate.cron

When running -34, the system will be in swap within minutes and only about
30-40% of the memory will be used and remain relatively unchanged.

When running -40, the system will hit swap, but utilize less swap and the memory
are reported by free will show about 70% utilized and the utilization will
slowly increase to about 90% of memory used.

Comment 20 claranet 2006-08-02 08:11:22 EDT
Is there any idea of when this fix is going to be implemented in the next kernel
release? 
we have just upgraded 6 boxes to fix an earlier vulnerability in the older
kernel, we have just noticed that one of our customer with 1gb of ram +
2.6.9-34.0.2 + sql.. is getting very high load on his server since the upgrade.

We are getting the same symptons as the guy above except because its getting so
many queries from sql and they arent being served from the ram, the disk is just
getting thrashed and causing very high load, which then sometimes causes the box
to hang completely.
Comment 21 Larry Woodman 2006-08-02 11:00:11 EDT
This problem was fixed in RHEL4-U4/kernel-2.6.9-42.  If this doesn not fix this
problem where Lowmem is getting pre-maturely swapped out, please let me know.

Larry Woodman
Comment 22 claranet 2006-08-03 03:56:45 EDT
Hi larry,

Redhat es4 lists the latest kernel as kernel-devel-2.6.9-34.0.2.EL on
up2date.. I can also not see that version when searching through the packages on
rhn.. Doing an all channel search (which includes all the beta ones)
Is the only way of upgrading it  downloading the rpm and doing a manual
rpm -i? if so where can i source this from and is it a commercial release?

How come this is not on up2date yet, if the broken one is ? :)


regards

anthony
Comment 23 Janak 2006-08-09 14:24:51 EDT
When is RHEL4-U4 getting released i don't see it....
Comment 24 Red Hat Bugzilla 2006-08-10 19:05:21 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html

Note You need to log in before you can comment on or make changes to this bug.