118706 – Heavy swapping kills performance

Bug 118706 - Heavy swapping kills performance

Summary: Heavy swapping kills performance

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-03-19 09:58 UTC by Werner Moser
Modified:	2007-11-30 22:07 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-12-20 20:54:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
memory dump (5.52 KB, text/plain) 2004-03-19 10:03 UTC, Werner Moser	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2004:550	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4	2004-12-20 05:00:00 UTC

Description Werner Moser 2004-03-19 09:58:14 UTC

Description of problem:
x445 (4 cpus with ht enabled) with 16 gb ram, oracle9i installed, 
after a while the machine heavily starts swapping with the result 
that basically all running processes are blocked (status D) and the 
i/o rate is miserable.

Additional info: we use shmfs (size 8gb) for oracle.

As I have learned from other bugreports (e.g. 118152), it might be 
correlated with /proc/sys/vm/pagecache; here the settings:

cat /proc/sys/vm/pagecache: 1       15      50


here some lines from vmstat 1:
 1 14 1688028  22656 160476 15020856 2124 1892 10116  2324 1991 
12856  7  9 58 27
 1 12 1687008  22948 160548 15028736 2268 3324 10608  7556 2319 13963 
13 11 53 23
 2 12 1688740  24892 160548 15022428 2880 5936 14776  6856 2440 
16584  9 15 56 20
 0 14 1689704  22584 160552 15020416 2684 3608 12880  4664 2028 
16619  9 12 60 19
 0 13 1689848  23396 160556 15019552 1720 2168  6556  2652 1269 
10214  4  8 65 23
 0 14 1690008  22804 160608 15016684 1988 3988  9308  4520 1602 
12717  6  7 64 23
 0 18 1690548  22480 160612 15020744 2508 3116 10436  4500 1927 
17548  6 10 63 21
 1 14 1690980  22532 160620 15022992 1920 4568  7464  8568 2020 
14360  4  9 63 24
 4 18 1697076  27240 160624 15000760  364 11908  1292 12804 1394  
7404  7 10 27 55
 0 16 1697584  22324 160700 14999368  968 3528  3236  5636 1355 
10762  6  9 38 47



Version-Release number of selected component (if applicable): 
2.4.21-9.ELsmp, Red Hat Enterprise Linux AS release 3 (Taroon Update 
1)

I will attach some kernel-memory dumps (produced using: echo m 
> /proc/sysrq-trigger)


It is really urgent, since we migrated to this system yesterday (old 
system was RHEL 2.1 on x440) due to performance problems and need a 
stable and well-performing db/os.

Comment 1 Werner Moser 2004-03-19 10:03:36 UTC

Created attachment 98675 [details]
memory dump

some "echo m > /proc/sysrq-trigger"

Comment 2 Rik van Riel 2004-03-19 16:17:19 UTC

Could you please try "echo 30 > /proc/sys/vm/inactive_clean_percent" ?

There's a tuning bug in RHEL3 U1, which should be fixed in the current
U2 tree already.  It would be really helpful if we could determine
whether or not your problem has already been fixed, or whether we need
additional tweaks for the upcoming U2...

Comment 3 Werner Moser 2004-03-19 18:19:03 UTC

This parameter is already set to 30!

We had problems with kswapd (status DW causing constant load 1) when
we started testing this server and I found a bugreport suggesting the
change of inactive_clean_percent to 30, which did help.

Comment 4 Larry Woodman 2004-03-19 19:00:02 UTC

OK, this is the default value in RHEL3-U2, do you consider the problem
solved with inactive_clean_percent set to 30?

Larry

Comment 5 Werner Moser 2004-03-19 19:16:33 UTC

No.

Let me make myself clear:

* we started testing oracle9i / rhel3 on x445
* we experienced the "kswapd-blocked" problem => solved by setting
inactive_clean_percent to 30
* we had a lot of swapping => found the advice to set pagecache to "1
15 50", which we did (helped a bit)
* went live with the system => very heavy swapping (see my description
of this bug plus attachment) 

So the problem still exists.


Do you need further info (dumps, parameter-settings, ...)?
I have read something about oprofile; would that help (however I do
not really know to use it).


Werner

Comment 6 Werner Moser 2004-03-19 19:42:33 UTC

question:

what would be reasonable output from "sar -B 1 10" for the different
columns?

here a few lines:

08:47:13 PM  pgpgin/s pgpgout/s  activepg  inadtypg  inaclnpg  inatarpg
08:47:14 PM  15911.11    844.44   3190923    616030     92682    798390
08:47:15 PM  11280.00   2800.00   3190509    615379     92680    798120

08:47:15 PM  pgpgin/s pgpgout/s  activepg  inadtypg  inaclnpg  inatarpg
08:47:16 PM   6983.33   9133.33   3192395    615613     92678    798554
08:47:17 PM   7975.00    200.00   3192736    615551     92674    798617

08:47:17 PM  pgpgin/s pgpgout/s  activepg  inadtypg  inaclnpg  inatarpg
08:47:18 PM   6133.33  15009.52   3192760    615852     92670    798691

Comment 7 Larry Woodman 2004-03-19 20:21:17 UTC

Werner, please try setting the pagecache to 1 10 10 and see if this
helps by having the system only reclaim pagcache pages when it has
more than 10% of memory.

Larry

Comment 8 Werner Moser 2004-03-19 21:04:03 UTC

I have changed the parameters to 1 10 10.

How long will it take until I see any changes and what can I expect?


I have just found a comment in an oracle note that shmfs is swappable
while ramfs (new in RHEL3?!) is not.

<quote>
Previously, shmfs was mounted to /dev/shm. This still works in RHEL 3.
But because RHEL 3 does not have the bigpages functionality, shmfs
wwould be swappable. RHEL 3 adds ramfs. This is similar to shmfs
except that is is not swappable. So mounting ramfs to /dev/shm
provides an unswappable memory filesystem similar to what was possible
in AS 2.1 with shmfs and bigpages.
</quote>

Q1) Would the system perform better if I would use ramfs instead of
shmfs and why?

Q2) Would you recommend to use the hugemem kernel (the system has 16GB
ram) ?

Q3) Would you recommend to use hugetlb? What's the advantage?


Thanks a lot,

Werner

Comment 9 Rik van Riel 2004-03-19 21:13:47 UTC

A1) I would only use ramfs if the shared memory segment is small
enough to easily fit in memory, say 8GB ...

A2) The hugemem kernel may or may not be helpful to you, the kernel
can cache more metadata, but context switching and system calls are
more cpu intensive ... it's a low risk thing to try and it won't solve
the current problem, but it may be something to look into in the long run

A3) Hugetlb uses 2MB large pages that cannot be swapped out, it can
only be used for the permanently mapped part of the shared memory
segment, not for the indirect buffer cache. It is faster than 4kB
pages, though...

Comment 10 Werner Moser 2004-03-19 21:44:34 UTC

Rik, thank you for the additional infos.

Is there a documentation from redhat concerning the optimal
configuration of a RHEL3 for an oracle-installation? 

Would be really helpful.

Thanks,

Werner

Comment 11 Werner Moser 2004-03-22 16:38:32 UTC

After monitoring the server for a couple of days: 

The performance is better now, swapping has (completely) stopped as 
can be seen using sar -W.


Still, some documentation for the most important kernel-parameters 
would be helpful to understand what's going on.

Thanks,

Werner

Comment 14 Neil Horman 2004-03-29 14:03:18 UTC

A link to the latest copy of the RHEL3 vm tuning whitepaper can be
found at:
http://people.redhat.com/nhorman/papers.html

Hope that helps.

Comment 15 Neil Horman 2004-03-29 14:06:21 UTC

sorry that url should be:
http://people.redhat.com/nhorman/papers/papers.html

Comment 16 Larry Woodman 2004-03-30 14:43:53 UTC

Werner, does the combination of setting the inactive_clean_percent to
30 and setting the pagecache to 1 10 10 fix this for you problem?


Thanks, Larry Woodman

Comment 17 Werner Moser 2004-03-30 15:02:08 UTC

Yes, the problem is solved for me.

Thanks for your help,


Werner

Comment 18 Werner Moser 2004-05-27 09:54:59 UTC

We have now updated the system (RHEL3-U2) and have serious performance
problems when starting up oracle. 
The load reaches sheer unbelievable regions (up 50 and above) and is
completely unresponsive (sometimes up to 15 minutes!!). In this time
the swap increases (top value: 3.7gb), vmstat shows all processes
blocked, null running (obviously the i/o is miserable due to the heavy
swapping).

Then the swap is continually decreased (over a period of some hours)
and the systems performs o.k.

The machine is still running with the kernel parameters you told me:

vm.inactive_clean_percent = 30
vm.pagecache = 1        10      10

Are these parameters ok with the newer kernel (2.4.21-15.ELsmp) or do
they cause this terrible behaviour?

If so, what is a better setting?


Here some listings:

sar -q:
=======
09:00:01 PM         2       110      2.67      2.84
09:10:00 PM         2       106      0.02      1.06
09:20:00 PM         4       123      4.58      3.79
09:30:00 PM         5       111      0.11      1.31
09:40:00 PM         5       129      1.75      1.38
09:50:00 PM         5       123      0.32      0.82
10:00:00 PM         3       112      0.04      0.18
10:10:00 PM         4       145      0.26      0.42
10:20:00 PM         4       141      0.01      0.14
10:46:25 PM        21       218     53.63     49.60
10:51:37 PM         7       221     35.01     37.06
11:00:00 PM         6       203     17.66     23.84
11:10:01 PM         9       226     17.67     20.82
11:20:01 PM         6       225     15.65     16.49
11:30:01 PM         8       198      6.97     11.90
11:39:59 PM         6       195     23.28     21.30
11:50:00 PM         8       199      6.83      9.21

sar -r:
=======
10:20:00 PM     53536  16464164     99.68         0    148024 
15926716  16386292         0      0.00
10:46:25 PM     17756  16499944     99.89         0    171156 
14823168  14767768   1618524      9.88
10:51:37 PM     17904  16499796     99.89         0    170284 
13265520  12920348   3465944     21.15
11:00:00 PM     68384  16449316     99.59         0    173108 
13817620  13326268   3060024     18.67
11:10:01 PM     24408  16493292     99.85         0    161696 
13688444  12793100   3593192     21.93
11:20:01 PM     19168  16498532     99.88         0    145260 
14368920  13734080   2652212     16.19
11:30:01 PM     20512  16497188     99.88         0    144608 
15410196  14374292   2012000     12.28
11:39:59 PM     25108  16492592     99.85         0    121816 
15286328  13595744   2790548     17.03
11:50:00 PM     27976  16489724     99.83         0    127260 
15363508  13765152   2621140     16.00
12:10:44 AM     22588  16495112     99.86         0    138448 
15152928  13650848   2735444     16.69
12:20:00 AM     22896  16494804     99.86         0    143420 
15244832  13654748   2731544     16.67
12:30:01 AM     20908  16496792     99.87         0    175884 
15261584  13907088   2479204     15.13
12:40:00 AM     20868  16496832     99.87         0    184512 
15224232  14101836   2284456     13.94
12:50:00 AM     19316  16498384     99.88         0    190864 
15241304  14279004   2107288     12.86
01:00:01 AM     27744  16489956     99.83         0    199524 
15218172  14431788   1954504     11.93
01:10:01 AM     23396  16494304     99.86         0    202664 
15239728  14589456   1796836     10.97
01:20:00 AM     26552  16491148     99.84         0    215364 
15158296  14782912   1603380      9.78
01:30:00 AM     23828  16493872     99.86         0    217252 
15240252  15385096   1001196      6.11
01:40:01 AM     22472  16495228     99.86         0    216636 
15279876  15670236    716056      4.37
01:50:00 AM     21252  16496448     99.87         0    219036 
15270044  15806328    579964      3.54
02:00:00 AM     30248  16487452     99.82         0    221228 
15331456  15923068    463224      2.83
02:10:00 AM     35856  16481844     99.78         0    220996 
15276340  16127324    258968      1.58
02:20:01 AM     32644  16485056     99.80         0    223304 
15316840  16292164     94128      0.57
02:30:00 AM     27872  16489828     99.83         0    222404 
15306336  16325672     60620      0.37
02:40:00 AM     20180  16497520     99.88         0    224280 
15295372  16339460     46832      0.29
02:50:01 AM     20428  16497272     99.88         0    226588 
15301208  16343160     43132      0.26
03:00:00 AM     23828  16493872     99.86         0    222128 
15264296  16345468     40824      0.25
03:10:00 AM     18340  16499360     99.89         0    229628 
15246684  16378576      7716      0.05



Thanks,

Werner

Comment 19 Kevin Stussman 2004-06-04 16:18:10 UTC

I would like to add a "me too" to this. Our oracle instance is using well less than our max 
physical memory, yet we are constantly swapping. The kernel version and most of the 
stats are exactly the same as Wener's.

Comment 20 Franklin Abud 2004-06-14 06:05:20 UTC

customer of mine also have the same issue, here are some information:

pagecache = 1 10 10 
inactive_clean_percent = 30

Customer given information:

After speaking with others who have a similar problem (one fellow
actually resorting to turning off  
swap completely to gain the kind of performance you would expect
without swapping), I applied the  
following settings overnight: 
 
vm.bdflush = 10 1000 500 5000 0 6000 100 0 0 
vm.overcommit_memory = 0 
vm.overcommit_ratio  = 75 
vm.pagecache = 1       10      10 
 
The short term effects were a release of free memory and the use of
less swap: 
 
09:50:00 AM     29408   6033680     99.51         0     39956  
5684100   8034596    351252      4. 
19 
10:00:00 AM    427620   5635468     92.95         0     40120  
5173012   8038340    347508      4. 
14 
 
but our overnight processes (which should be running in real memory)
are still using swap (but to a  
lesser degree to previous runs): 
 
(before settings) 
04:20:00 PM     19648   6043440     99.68         0     69672  
5446560   7210228   1175620     14. 
02 
 
(after settings) 
04:20:00 AM     17592   6045496     99.71         0      3636  
5807160   8253684    132164      1.58 
 
Why is it using swap at all?

Comment 21 Gary Mansell 2004-08-25 08:49:48 UTC

I am having problems with this too - I have been mistakenly posting to
the RH9 version of this bug when my problem is with RHEL 3.

I have already added the following to the sysctl.conf file but it does
not cure the problem:

#
# Fix for too aggressive file cacheing
#
vm.pagecache = 1 10 10
vm.inactive_clean_percent = 30


Can someone please tell me if there is a work around to fix this
problem or a patch.

Comment 22 Ernie Petrides 2004-09-24 09:45:00 UTC

A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.11.EL).

Comment 23 John Flanagan 2004-12-20 20:54:55 UTC

An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html

Note You need to log in before you can comment on or make changes to this bug.