Description of problem: x445 (4 cpus with ht enabled) with 16 gb ram, oracle9i installed, after a while the machine heavily starts swapping with the result that basically all running processes are blocked (status D) and the i/o rate is miserable. Additional info: we use shmfs (size 8gb) for oracle. As I have learned from other bugreports (e.g. 118152), it might be correlated with /proc/sys/vm/pagecache; here the settings: cat /proc/sys/vm/pagecache: 1 15 50 here some lines from vmstat 1: 1 14 1688028 22656 160476 15020856 2124 1892 10116 2324 1991 12856 7 9 58 27 1 12 1687008 22948 160548 15028736 2268 3324 10608 7556 2319 13963 13 11 53 23 2 12 1688740 24892 160548 15022428 2880 5936 14776 6856 2440 16584 9 15 56 20 0 14 1689704 22584 160552 15020416 2684 3608 12880 4664 2028 16619 9 12 60 19 0 13 1689848 23396 160556 15019552 1720 2168 6556 2652 1269 10214 4 8 65 23 0 14 1690008 22804 160608 15016684 1988 3988 9308 4520 1602 12717 6 7 64 23 0 18 1690548 22480 160612 15020744 2508 3116 10436 4500 1927 17548 6 10 63 21 1 14 1690980 22532 160620 15022992 1920 4568 7464 8568 2020 14360 4 9 63 24 4 18 1697076 27240 160624 15000760 364 11908 1292 12804 1394 7404 7 10 27 55 0 16 1697584 22324 160700 14999368 968 3528 3236 5636 1355 10762 6 9 38 47 Version-Release number of selected component (if applicable): 2.4.21-9.ELsmp, Red Hat Enterprise Linux AS release 3 (Taroon Update 1) I will attach some kernel-memory dumps (produced using: echo m > /proc/sysrq-trigger) It is really urgent, since we migrated to this system yesterday (old system was RHEL 2.1 on x440) due to performance problems and need a stable and well-performing db/os.
Created attachment 98675 [details] memory dump some "echo m > /proc/sysrq-trigger"
Could you please try "echo 30 > /proc/sys/vm/inactive_clean_percent" ? There's a tuning bug in RHEL3 U1, which should be fixed in the current U2 tree already. It would be really helpful if we could determine whether or not your problem has already been fixed, or whether we need additional tweaks for the upcoming U2...
This parameter is already set to 30! We had problems with kswapd (status DW causing constant load 1) when we started testing this server and I found a bugreport suggesting the change of inactive_clean_percent to 30, which did help.
OK, this is the default value in RHEL3-U2, do you consider the problem solved with inactive_clean_percent set to 30? Larry
No. Let me make myself clear: * we started testing oracle9i / rhel3 on x445 * we experienced the "kswapd-blocked" problem => solved by setting inactive_clean_percent to 30 * we had a lot of swapping => found the advice to set pagecache to "1 15 50", which we did (helped a bit) * went live with the system => very heavy swapping (see my description of this bug plus attachment) So the problem still exists. Do you need further info (dumps, parameter-settings, ...)? I have read something about oprofile; would that help (however I do not really know to use it). Werner
question: what would be reasonable output from "sar -B 1 10" for the different columns? here a few lines: 08:47:13 PM pgpgin/s pgpgout/s activepg inadtypg inaclnpg inatarpg 08:47:14 PM 15911.11 844.44 3190923 616030 92682 798390 08:47:15 PM 11280.00 2800.00 3190509 615379 92680 798120 08:47:15 PM pgpgin/s pgpgout/s activepg inadtypg inaclnpg inatarpg 08:47:16 PM 6983.33 9133.33 3192395 615613 92678 798554 08:47:17 PM 7975.00 200.00 3192736 615551 92674 798617 08:47:17 PM pgpgin/s pgpgout/s activepg inadtypg inaclnpg inatarpg 08:47:18 PM 6133.33 15009.52 3192760 615852 92670 798691
Werner, please try setting the pagecache to 1 10 10 and see if this helps by having the system only reclaim pagcache pages when it has more than 10% of memory. Larry
I have changed the parameters to 1 10 10. How long will it take until I see any changes and what can I expect? I have just found a comment in an oracle note that shmfs is swappable while ramfs (new in RHEL3?!) is not. <quote> Previously, shmfs was mounted to /dev/shm. This still works in RHEL 3. But because RHEL 3 does not have the bigpages functionality, shmfs wwould be swappable. RHEL 3 adds ramfs. This is similar to shmfs except that is is not swappable. So mounting ramfs to /dev/shm provides an unswappable memory filesystem similar to what was possible in AS 2.1 with shmfs and bigpages. </quote> Q1) Would the system perform better if I would use ramfs instead of shmfs and why? Q2) Would you recommend to use the hugemem kernel (the system has 16GB ram) ? Q3) Would you recommend to use hugetlb? What's the advantage? Thanks a lot, Werner
A1) I would only use ramfs if the shared memory segment is small enough to easily fit in memory, say 8GB ... A2) The hugemem kernel may or may not be helpful to you, the kernel can cache more metadata, but context switching and system calls are more cpu intensive ... it's a low risk thing to try and it won't solve the current problem, but it may be something to look into in the long run A3) Hugetlb uses 2MB large pages that cannot be swapped out, it can only be used for the permanently mapped part of the shared memory segment, not for the indirect buffer cache. It is faster than 4kB pages, though...
Rik, thank you for the additional infos. Is there a documentation from redhat concerning the optimal configuration of a RHEL3 for an oracle-installation? Would be really helpful. Thanks, Werner
After monitoring the server for a couple of days: The performance is better now, swapping has (completely) stopped as can be seen using sar -W. Still, some documentation for the most important kernel-parameters would be helpful to understand what's going on. Thanks, Werner
A link to the latest copy of the RHEL3 vm tuning whitepaper can be found at: http://people.redhat.com/nhorman/papers.html Hope that helps.
sorry that url should be: http://people.redhat.com/nhorman/papers/papers.html
Werner, does the combination of setting the inactive_clean_percent to 30 and setting the pagecache to 1 10 10 fix this for you problem? Thanks, Larry Woodman
Yes, the problem is solved for me. Thanks for your help, Werner
We have now updated the system (RHEL3-U2) and have serious performance problems when starting up oracle. The load reaches sheer unbelievable regions (up 50 and above) and is completely unresponsive (sometimes up to 15 minutes!!). In this time the swap increases (top value: 3.7gb), vmstat shows all processes blocked, null running (obviously the i/o is miserable due to the heavy swapping). Then the swap is continually decreased (over a period of some hours) and the systems performs o.k. The machine is still running with the kernel parameters you told me: vm.inactive_clean_percent = 30 vm.pagecache = 1 10 10 Are these parameters ok with the newer kernel (2.4.21-15.ELsmp) or do they cause this terrible behaviour? If so, what is a better setting? Here some listings: sar -q: ======= 09:00:01 PM 2 110 2.67 2.84 09:10:00 PM 2 106 0.02 1.06 09:20:00 PM 4 123 4.58 3.79 09:30:00 PM 5 111 0.11 1.31 09:40:00 PM 5 129 1.75 1.38 09:50:00 PM 5 123 0.32 0.82 10:00:00 PM 3 112 0.04 0.18 10:10:00 PM 4 145 0.26 0.42 10:20:00 PM 4 141 0.01 0.14 10:46:25 PM 21 218 53.63 49.60 10:51:37 PM 7 221 35.01 37.06 11:00:00 PM 6 203 17.66 23.84 11:10:01 PM 9 226 17.67 20.82 11:20:01 PM 6 225 15.65 16.49 11:30:01 PM 8 198 6.97 11.90 11:39:59 PM 6 195 23.28 21.30 11:50:00 PM 8 199 6.83 9.21 sar -r: ======= 10:20:00 PM 53536 16464164 99.68 0 148024 15926716 16386292 0 0.00 10:46:25 PM 17756 16499944 99.89 0 171156 14823168 14767768 1618524 9.88 10:51:37 PM 17904 16499796 99.89 0 170284 13265520 12920348 3465944 21.15 11:00:00 PM 68384 16449316 99.59 0 173108 13817620 13326268 3060024 18.67 11:10:01 PM 24408 16493292 99.85 0 161696 13688444 12793100 3593192 21.93 11:20:01 PM 19168 16498532 99.88 0 145260 14368920 13734080 2652212 16.19 11:30:01 PM 20512 16497188 99.88 0 144608 15410196 14374292 2012000 12.28 11:39:59 PM 25108 16492592 99.85 0 121816 15286328 13595744 2790548 17.03 11:50:00 PM 27976 16489724 99.83 0 127260 15363508 13765152 2621140 16.00 12:10:44 AM 22588 16495112 99.86 0 138448 15152928 13650848 2735444 16.69 12:20:00 AM 22896 16494804 99.86 0 143420 15244832 13654748 2731544 16.67 12:30:01 AM 20908 16496792 99.87 0 175884 15261584 13907088 2479204 15.13 12:40:00 AM 20868 16496832 99.87 0 184512 15224232 14101836 2284456 13.94 12:50:00 AM 19316 16498384 99.88 0 190864 15241304 14279004 2107288 12.86 01:00:01 AM 27744 16489956 99.83 0 199524 15218172 14431788 1954504 11.93 01:10:01 AM 23396 16494304 99.86 0 202664 15239728 14589456 1796836 10.97 01:20:00 AM 26552 16491148 99.84 0 215364 15158296 14782912 1603380 9.78 01:30:00 AM 23828 16493872 99.86 0 217252 15240252 15385096 1001196 6.11 01:40:01 AM 22472 16495228 99.86 0 216636 15279876 15670236 716056 4.37 01:50:00 AM 21252 16496448 99.87 0 219036 15270044 15806328 579964 3.54 02:00:00 AM 30248 16487452 99.82 0 221228 15331456 15923068 463224 2.83 02:10:00 AM 35856 16481844 99.78 0 220996 15276340 16127324 258968 1.58 02:20:01 AM 32644 16485056 99.80 0 223304 15316840 16292164 94128 0.57 02:30:00 AM 27872 16489828 99.83 0 222404 15306336 16325672 60620 0.37 02:40:00 AM 20180 16497520 99.88 0 224280 15295372 16339460 46832 0.29 02:50:01 AM 20428 16497272 99.88 0 226588 15301208 16343160 43132 0.26 03:00:00 AM 23828 16493872 99.86 0 222128 15264296 16345468 40824 0.25 03:10:00 AM 18340 16499360 99.89 0 229628 15246684 16378576 7716 0.05 Thanks, Werner
I would like to add a "me too" to this. Our oracle instance is using well less than our max physical memory, yet we are constantly swapping. The kernel version and most of the stats are exactly the same as Wener's.
customer of mine also have the same issue, here are some information: pagecache = 1 10 10 inactive_clean_percent = 30 Customer given information: After speaking with others who have a similar problem (one fellow actually resorting to turning off swap completely to gain the kind of performance you would expect without swapping), I applied the following settings overnight: vm.bdflush = 10 1000 500 5000 0 6000 100 0 0 vm.overcommit_memory = 0 vm.overcommit_ratio = 75 vm.pagecache = 1 10 10 The short term effects were a release of free memory and the use of less swap: 09:50:00 AM 29408 6033680 99.51 0 39956 5684100 8034596 351252 4. 19 10:00:00 AM 427620 5635468 92.95 0 40120 5173012 8038340 347508 4. 14 but our overnight processes (which should be running in real memory) are still using swap (but to a lesser degree to previous runs): (before settings) 04:20:00 PM 19648 6043440 99.68 0 69672 5446560 7210228 1175620 14. 02 (after settings) 04:20:00 AM 17592 6045496 99.71 0 3636 5807160 8253684 132164 1.58 Why is it using swap at all?
I am having problems with this too - I have been mistakenly posting to the RH9 version of this bug when my problem is with RHEL 3. I have already added the following to the sysctl.conf file but it does not cure the problem: # # Fix for too aggressive file cacheing # vm.pagecache = 1 10 10 vm.inactive_clean_percent = 30 Can someone please tell me if there is a work around to fix this problem or a patch.
A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-20.11.EL).
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html