Bug 160033

Summary:

Kernel swaps out Oracle instead of releasing cache

Product:

Red Hat Enterprise Linux 4

Reporter:

Dirk Gfroerer <dirk.gfroerer>

Component:

kernel

Assignee:

Larry Woodman <lwoodman>

Status:

CLOSED NOTABUG

QA Contact:

Brian Brock <bbrock>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

4.0

CC:

cchan, jplans, jwest, lgranquist, rich, riel

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2010-06-07 05:44:11 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
memory stats	none

Description Dirk Gfroerer 2005-06-10 06:53:25 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050512 Red Hat/1.0.4-1.4.1 Firefox/1.0.4

Description of problem:
This is dual Intel Xeon (HT enabled) machine with 2GB of RAM. The machine is serving four Oracle 9iR2 (i.e. 9.2.0.6) instances. The Oracle instances fit nicely into the available RAM and leave enough room for the OS and some buffers. The machine wasn't using any swap at all, when running RHEL3 with "vm.pagecache = 1 10 20". After its upgrade to RHEL4 we're seeing a swap usage of around 550 - 600MB. When we shut down Oracle swap used shrinks to several kB, used RAM is still around 1.5 GB in (inclusive cache and buffers). So it looks like RHEL is paging out Oracle in order to get more space for cache / buffers. This is not really optimal, since the part of Oracle which was paged out might contain the buffers which are used by Oracle to reduce the amount of I/Os. You also see delays when you're accessing the database. Operations which are normally done within just a few milliseconds take several seconds. The machine is paging in during this time.
According to a discussion on LKML this can be cured by setting vm.swappiness to zero. So we've added this to /etc/sysctl.conf but this doesn't seem to change anything.
U1 seems to improve the situation somewhat. Swap is no longer around 550 - 600 MB but ~ 400MB (but the machine has currently an uptime of around two hours only, so this may change).

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-11.EL

How reproducible:
Always

Steps to Reproduce:
1. Startup the machine with its Oracle instances.
2. Start analyzing the schemas within the database.

Actual Results: All available memory is claimed then Oracle is being paged out to disk.

Expected Results: All available memory should be used but swap should remain untouched.

Additional info:

Comment 1 Larry Woodman 2005-06-15 18:08:51 UTC

RHEL4 works differently than RHEL3.  Please try lowering /proc/sys/vm/swappiness
from the default of 60 to 50 then 40 then 30 until your system reclaims
pagecache pages rather than swapping out Oracle.  Please let me know how this goes.

Thanks, Larry Woodman

Comment 2 Dirk Gfroerer 2005-11-10 11:55:09 UTC

Sorry for the delay.

The machine was running with /proc/sys/vm/swappiness 0 for some time now (with
the U2 kernel in the meantime) put still was using a large amount of swap.

In order to be sure my memory calculations are correct, I've turned now swap
completey off for the machine. I haven't seen any "out of memory" problems on
the machine. And we put quite some load on it during this time. However I don't
consider this to be a workaround.

Do you think it will make a difference if we set swappinnes to 30 or 40?

Comment 3 Dirk Gfroerer 2005-11-21 06:24:36 UTC

I've switched back to the default value of 60 and then used 50, 40, 30, ...
didn't make a difference.

Comment 4 Dirk Gfroerer 2005-12-02 12:35:02 UTC

Installed today Oracle 10g R2 on a AMD64/EM64T with RHEL 4U2. Same behaviour.
The machine has 4GB of RAM. Oracle is configured to use about 2GB of RAM. The
machine just started to use the page file (currently 90 MB are in use).

Comment 5 Richard N. Fogle 2005-12-20 20:08:47 UTC

I've encountered this as well:

AMD Athlon XP 3000+
1GB of RAM
kernel-2.6.9-11.EL
Red Hat Enterprise Linux ES release 4 (Nahant Update 2)
vm.swappiness = 0

Box is a mail gateway that runs sendmail and spamassassin.  System originally on
the verge of collapse due to very high iowait% - there are two servers that do
the exact same thing as this one, are RHEL 2.1 and RHEL 3.0, with none of these
problems.  When I first logged on the server was in 30% swap with only 40%
physical memory usage.  I turned swappines to zero, rebooted, and the server is
running in 10% swap with only 25% physical memory used.  On a busy mail server
running IDE even 10% of swap is noticable.  End result is this will be yet
another server we have to revert back to RHEL 3 because of performance issues.  

I'd rather not disable swap either.  Any suggestions?  Out of the box RHEL 4
isn't doing so well against RHEL 3 (or even RHEL 2.1) and we deploy thousands of
servers.

Comment 6 Richard N. Fogle 2006-01-11 16:29:54 UTC

I just verified this again on another RHEL4 server, 2.6.9-22:

with vm.swappiness = 0, 10-minute intervals:

memory swap
used   used
62.89  6.20
49.72  6.79
42.68  6.91
47.42  7.24
48.62  7.83
48.67  8.11
52.29  9.35
49.86  11.10
47.71  11.73
49.64  12.28
50.68  12.45
55.14  13.16
53.79  13.79
48.72  13.81
52.02  15.44
53.37  16.20
48.14  18.67

We're having to roll back the kernel to 2.6.9-11 again.  Is this thread still
alive?  If I can offer any more information or assistance then please do let me
know.

Comment 7 Dirk Gfroerer 2006-01-12 14:36:07 UTC

I've opened a support ticket in the mean time and after some discussion with Red
Hat they told me to open a support ticket with Oracle. Oracle told me this is
the expected behaviour (!) and that I have to set LOCK_SGA=true in the
initXXX.ora file which locks the SGA of Oracle into the RAM. I'm not seeing this
issue any more on my machine(s). However I'm not really satisified since I have
to be very careful when setting up a new database. If I assign too much RAM it
will blow up with out of memory.
Also no one came up with an explanation on why application memory is being paged
out when vm.swapiness is set to 0.

Comment 8 Christopher Chan 2006-05-15 14:22:11 UTC

Created attachment 129064 [details]
memory stats

Comment 9 Christopher Chan 2006-05-15 14:24:39 UTC

Comment on attachment 129064 [details]
memory stats

RHEL4 vm behaviour leaves much to be desired. Converting boxes over to FC4 due
to the performance hit.

Comment 10 Larry Woodman 2006-12-12 16:21:26 UTC

What is the state of this bug now?  Is the system still swapping out the Oracle
SGA of have the issues been with the tuning?  If the system is still swapping
out the SGA please get me several Alt Sysrq M outputs whe the swapping is occuring.

Thanks, Larry Woodman

Comment 11 David Kostal 2007-08-31 11:43:27 UTC

I had similar problem and in my case helped to turn on hugepages (eg. in
/etc/sysctl.conf set vm.nr_hugepages = 5120). 
Before I had (out of 16GB RAM) 5.5GB cached (containing 5.3GB swap cache) and
system was swapping out over 2GB.
After the hugepages were allowed, no swapping, cached is only 400MB and still
have  few gigs of free memory.

Comment 13 Lamont Granquist 2009-07-29 19:05:23 UTC

I'm seeing behavior similar to this on RHEL4 (2.6.9-89.ELsmp) and on RHEL5 kernels (2.6.18-128.2.1.el5 installed on a RHEL4 O/S), with swappiness set to 0 in a java application.

The servers are 16GB with 10GB heap and about 7-8GB of tenured, which are live objects but some of them are very infrequently accessed. The app mmap()s a lot of very large files (~50GB of mmap()'d files in the VMA space) and the VM is clearly scavenging less-used pages in the tenured generation and almost seems to be pinning the mmap()'d pages (however after a lot of investigation i don't see any calls to mmap() or mlock() which actually would pin the pages) and evicting the tenured generation to swap. After a couple hours a FullGC kicks in and walks through all the objects in the tenured generation and I'm seeing 15-30 minute FullGC stops as opposed to 30 second FullGC stops.

I'm not positive what the actual VM pressure is on the mmap()'d files, but its plausible that the evicted tenured heap pages are only accessed every few hours by the FullGC. The disks tend to be ~20% utilized on a 2-disk RAID1 SAS 10k array (Dell 1950).

Using 'swapoff -a' might be a workaround and I'm testing that now... I'm suspecting that swappiness=0 is still too aggressive in swapping out anonymous pages for server apps. An anonymous page that hasn't been hit in days is still potentially more valuable to me than a buffer cache page that got hit a minute ago (some of my apps have stable enough GC behavior that they will FullGC *very* infrequently, but when they do a walk through the tenured gen it is a disaster to have to pull those pages in from swap).

I do need to do more work on CMS collection and Garbage First collectors in java, and more work on addressing having lots (GBs) of very infrequently accessed objects in the tenured gen, but I can't eliminate this pattern of memory access from these servers completely.

So far I don't have a good synthetic test case.

Comment 14 Rik van Riel 2009-07-29 19:29:10 UTC

The basic design of the VM in RHEL 4 and RHEL 5 will not allow a complete fix for this issue, only tweaks to make it behave better most of the time.

In the upstream kernel (and for RHEL 6), this problem has been addressed with the split LRU VM, which was merged in 2.6.28 and continues to get small fixes and tweaks.  In RHEL 6 this issue should be resolved.

Comment 15 Larry Woodman 2009-07-29 21:09:16 UTC

Lowering swappiness is likely to make the situation described here worse.  Ths swappiness tunable controls how agressively the system deactivates active pages that are mapped.  Since mmap()'d file pages and anonymous pages are both mapped into virtual address spaces lowing swappiness tells the system not to deactivated either mmap()'d file pages or anonymous pages until the system is under signaficantly more memory pressure.

Larry Woodman

Comment 16 Lamont Granquist 2009-07-29 21:17:24 UTC

Thanks for the comments.  I had assumed that "swapiness" would treat anonymous and file-backed pages differently, I didn't realize the balance was between VMA mapped and pure page cache pages.

Are there any important /proc tweakables for the 2.6.30.1 kernel for the LRU VM to control how aggressively anon pages are swapped out?

Comment 17 Rik van Riel 2009-07-29 21:53:25 UTC

In 2.6.30, swappiness does what you expect :)