Description of problem: A single 64bit process can not malloc and touch more than 4GB of memory with out causing the system to use swap, even though there is still free memory present. System: AMD Quartet Memory: 16GB of Physical memory (4GB per-processor) Processors: 4 - AMD 844 Disk: 18GB SCSI Number of Test processes: 1, 2, 3, 4 Max Memory per-process before swapping occured: 3.9GB, 3.7GB, 3.7GB, 3.25GB Total memory used by all Processes: 3.9GB, 7.4GB, 11.1GB, 13GB Free Memory: ~11GB, ~6GB, ~3GB, ~1.5GB This same behavior was seen on Opteron system with 64GB Main Memory. Version-Release number of selected component (if applicable): Linux 2.4.21-17.ELsmp #1 SMP Thu Jul 8 19:30:27 EDT 2004 x86_64 GNU/Linux How reproducible: Every Time Steps to Reproduce: 1. Create a process that mallocs greater than 4GB of Ram. 2. Have the process memset the malloced array, then either read or with array in a loop. 3. What the swap file usage increase. Additional info: Test program avalible by request (kevin) Run as root or other user
This is likely normal behavior. The linux kernel splits memory into zones for 1.) DMAable memory 2.) Normal or kernel mapped memory and 3.) Highmem or memory that is physically larger than the system can map with in a virtual address space. This means that it is quite possible to exhaust one zone long before the other zones are exhausted, resulting in page reclamation and therefore paging/swap usage. Also, the system can allocate pages of swap space for a process to use long before it actually swaps pages out and this still shows up as swap usage. In addition, it is also perfectly normal for the system to run out of memory and start page reclamation even though you might not expect it. We use pages of memory to cache every file system data and meta-data item that ever used until the memory is exhausted then we start reclaiming those pages of memory is least recently used order. If pages that need to be swapped out are the oldest we swap them out. Having said that, is this causing a problem on your system or is this just behavior that you didnt expect to see? Anyway, please reproduce this problem you describe and get me several "AltSysrq M" outputs so I can see the exact memory state of your system and determine whether this is a normal situation or there is some kernel/memory problem that is unusual. Thanks, Larry Woodman
System information: Processor Info: total Processors : 4 processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 844 physical id : 0 siblings : 1 stepping : 1 cpu MHz : 1792.860 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow bogomips : 3578.26 TLB size : 1088 4K pages clflush size : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts ttp Memory Info total: used: free: shared: buffers: cached: Mem: 16034541568 217554944 15816986624 0 2199552 34111488 Swap: 12897230848 13819904 12883410944 MemTotal: 15658732 kB MemFree: 15446276 kB MemShared: 0 kB Buffers: 2148 kB Cached: 24728 kB SwapCached: 8584 kB Active: 31080 kB ActiveAnon: 5740 kB ActiveCache: 25340 kB Inact_dirty: 292 kB Inact_laundry: 1460 kB Inact_clean: 2900 kB Inact_target: 7144 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 15658732 kB LowFree: 15446276 kB SwapTotal: 12594952 kB SwapFree: 12581456 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB Kernel Stock RHAS3 Update3 beta: [root@newport root]# uname -a Linux newport 2.4.21-17.ELsmp #1 SMP Thu Jul 8 19:30:27 EDT 2004 x86_64 x86_64 x86_64 GNU/Linux <<< Test Info >>> ################ One process grabbing 12GB of memory ######################################### Causes Heavey swapping for 10min. Number of Processes 1 Memory Size 12000MB RM 100% WM 0% Run Time 600s Starting Load Generation Please Wait ... --- Results (MB/s)--- PROCESS #,Read Miss(MB/s),Write Miss(MB/s),Combined (MB/s) 0,83.9161,0,83.9161 --- Total --- 1,83.9161,0,83.9161 ---------------------- altsysrq M ------------------------------------ Aug 20 10:50:33 newport kernel: Mem-info: Aug 20 10:50:33 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 10:50:33 newport kernel: Zone:Normal freepages: 10539 min: 1279 low: 17406 high: 25597 Aug 20 10:50:33 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 10:50:33 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 10:50:33 newport kernel: Zone:Normal freepages: 17178 min: 1279 low: 17406 high: 25597 Aug 20 10:50:33 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 10:50:33 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 10:50:33 newport kernel: Zone:Normal freepages: 14445 min: 1279 low: 17406 high: 25597 Aug 20 10:50:33 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 10:50:33 newport kernel: Zone:DMA freepages: 2617 min: 1056 low: 1088 high: 1120 Aug 20 10:50:41 newport kernel: Zone:Normal freepages:861136 min: 1279 low: 17342 high: 25501 Aug 20 10:50:41 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 10:50:41 newport kernel: Free pages: 905915 ( 0 HighMem) Aug 20 10:50:41 newport kernel: ( Active: 2645625/263031, inactive_laundry: 50208, inactive_clean: 29300, free: 905915 ) Aug 20 10:50:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 10:50:41 newport kernel: aa:901913 ac:1558 id:61553 il:11534 ic:6896 fr:10539 Aug 20 10:50:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 10:50:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 10:50:41 newport kernel: aa:867678 ac:879 id:100296 il:19880 ic:10183 fr:17178 Aug 20 10:50:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 10:50:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 10:50:41 newport kernel: aa:869084 ac:1449 id:100926 il:18792 ic:11396 fr:14445 Aug 20 10:50:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 10:50:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:2617 Aug 20 10:50:41 newport kernel: aa:769 ac:2295 id:256 il:2 ic:825 fr:861136 Aug 20 10:50:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 10:50:41 newport kernel: 1*4kB 1*8kB 0*16kB 1*32kB 0*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 10*4096kB = 42156kB) Aug 20 10:50:41 newport kernel: Swap cache: add 2633120, delete 2553093, find 1375365/1891330, race 0+0 Aug 20 10:50:41 newport kernel: 4380 pages of slabcache Aug 20 10:50:41 newport kernel: 94 pages of kernel stacks Aug 20 10:50:41 newport kernel: 96 lowmem pagetables, 5987 highmem pagetables Aug 20 10:50:41 newport kernel: Free swap: 12273992kB Aug 20 10:50:41 newport kernel: 4194300 pages of RAM Aug 20 10:50:41 newport kernel: 913080 free pages Aug 20 10:50:41 newport kernel: 279617 reserved pages Aug 20 10:50:41 newport kernel: 9345 pages shared Aug 20 10:50:41 newport kernel: 83734 pages swap cached Aug 20 10:50:41 newport kernel: Buffer memory: 2132kB Aug 20 10:50:41 newport kernel: Cache memory: 369544kB Aug 20 10:50:41 newport kernel: CLEAN: 355 buffers, 1411 kbyte, 25 used (last=294), 0 locked, 0 dirty 0 delay Aug 20 10:50:41 newport kernel: LOCKED: 1 buffers, 4 kbyte, 1 used (last=1), 1 locked, 0 dirty 0 delay ################ One process grabbing 3.9GB of memory ######################################### No swapping occured Number of Processes 1 Memory Size 3900MB RM 100% WM 0% Run Time 600s Starting Load Generation Please Wait ... --- Results (MB/s)--- PROCESS #,Read Miss(MB/s),Write Miss(MB/s),Combined (MB/s) 0,2343.94,0,2343.94 --- Total --- 1,2343.94,0,2343.94 ---------------------- altsysrq M ------------------------------------ Aug 20 11:07:02 newport kernel: Mem-info: Aug 20 11:07:02 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:07:02 newport kernel: Zone:Normal freepages:977713 min: 1279 low: 17406 high: 25597 Aug 20 11:07:02 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:07:02 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:07:02 newport kernel: Zone:Normal freepages:990802 min: 1279 low: 17406 high: 25597 Aug 20 11:07:02 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:07:02 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:07:02 newport kernel: Zone:Normal freepages:776616 min: 1279 low: 17406 high: 25597 Aug 20 11:07:02 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:07:02 newport kernel: Zone:DMA freepages: 2617 min: 1056 low: 1088 high: 1120 Aug 20 11:07:02 newport kernel: Zone:Normal freepages: 77148 min: 1279 low: 17342 high: 25501 Aug 20 11:07:02 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:07:02 newport kernel: Free pages: 2824896 ( 0 HighMem) Aug 20 11:07:02 newport kernel: ( Active: 1004695/1513, inactive_laundry: 58, inactive_clean: 811, free: 2824896 ) Aug 20 11:07:02 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:07:02 newport kernel: aa:108 ac:1560 id:2 il:8 ic:0 fr:977713 Aug 20 11:07:02 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:07:02 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:07:02 newport kernel: aa:187 ac:876 id:0 il:50 ic:0 fr:990802 Aug 20 11:07:02 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:07:02 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:07:02 newport kernel: aa:237938 ac:1473 id:1 il:0 ic:0 fr:776616 Aug 20 11:07:02 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:07:02 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:2617 Aug 20 11:07:02 newport kernel: aa:760251 ac:2302 id:1510 il:0 ic:811 fr:77148 Aug 20 11:07:02 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:07:02 newport kernel: 4637*4kB 4416*8kB 4331*16kB 3709*32kB 1998*64kB 705*128kB 98*256kB 21*512kB 25*1024kB 15*2048kB 820*4096kB = 3910852kB) Aug 20 11:07:02 newport kernel: Swap cache: add 6198833, delete 6197032, find 2441240/4135898, race 0+0 Aug 20 11:07:02 newport kernel: 70911 pages of slabcache Aug 20 11:07:02 newport kernel: 98 pages of kernel stacks Aug 20 11:07:02 newport kernel: 1583 lowmem pagetables, 660 highmem pagetables Aug 20 11:07:02 newport kernel: Free swap: 12582772kB Aug 20 11:07:03 newport kernel: 4194300 pages of RAM Aug 20 11:07:03 newport kernel: 2832614 free pages Aug 20 11:07:03 newport kernel: 279617 reserved pages Aug 20 11:07:03 newport kernel: 10135 pages shared Aug 20 11:07:03 newport kernel: 1801 pages swap cached Aug 20 11:07:03 newport kernel: Buffer memory: 2140kB Aug 20 11:07:03 newport kernel: Cache memory: 31328kB Aug 20 11:07:03 newport kernel: CLEAN: 358 buffers, 1423 kbyte, 25 used (last=354), 0 locked, 0 dirty 0 delay Aug 20 11:07:03 newport kernel: LOCKED: 4 buffers, 16 kbyte, 0 used (last=0), 0 locked, 0 dirty 0 delay ################ Two process grabbing 5GB of memory each ######################################### [root@newport root]# vmstat --mb 1 2 procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy wa id 2 0 6149 5070 2 24 0 0 670 2370 218 366 10 2 9 79 2 0 6151 5070 2 24 0 0 0 1288 114 18 50 2 0 48 Causes lite swapping for 10min. Number of Processes 2 Memory Size 10000MB RM 100% WM 0% Run Time 600s Starting Load Generation Please Wait ... --- Results (MB/s)--- PROCESS #,Read Miss(MB/s),Write Miss(MB/s),Combined (MB/s) 0,1782.09,0,1782.09 1,1825.13,0,1825.13 --- Total --- 2,3607.22,0,3607.22 ---------------------- altsysrq M ------------------------------------ Aug 20 11:20:23 newport kernel: Mem-info: Aug 20 11:20:23 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:20:23 newport kernel: Zone:Normal freepages:716141 min: 1279 low: 17406 high: 25597 Aug 20 11:20:23 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:20:23 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:20:23 newport kernel: Zone:Normal freepages: 4432 min: 1279 low: 17406 high: 25597 Aug 20 11:20:23 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:20:23 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:20:23 newport kernel: Zone:Normal freepages: 33446 min: 1279 low: 17406 high: 25597 Aug 20 11:20:23 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:20:23 newport kernel: Zone:DMA freepages: 2617 min: 1056 low: 1088 high: 1120 Aug 20 11:20:23 newport kernel: Zone:Normal freepages:564232 min: 1279 low: 17342 high: 25501 Aug 20 11:20:23 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:20:23 newport kernel: Free pages: 1320868 ( 0 HighMem) Aug 20 11:20:23 newport kernel: ( Active: 2397135/144404, inactive_laundry: 19748, inactive_clean: 7583, free: 1320868 ) Aug 20 11:20:23 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:20:23 newport kernel: aa:277637 ac:1561 id:5 il:0 ic:7 fr:716141 Aug 20 11:20:23 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:20:23 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:20:23 newport kernel: aa:879752 ac:878 id:97543 il:18893 ic:6626 fr:4432 Aug 20 11:20:23 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:20:23 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:20:23 newport kernel: aa:935376 ac:1473 id:46856 il:638 ic:139 fr:33446 Aug 20 11:20:23 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:20:23 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:2617 Aug 20 11:20:23 newport kernel: aa:298157 ac:2301 id:0 il:217 ic:811 fr:564232 Aug 20 11:20:23 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:20:23 newport kernel: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 699*4096kB = 2864564kB) Aug 20 11:20:23 newport kernel: Swap cache: add 6431175, delete 6211897, find 2646240/4340898, race 0+0 Aug 20 11:20:23 newport kernel: 9871 pages of slabcache Aug 20 11:20:23 newport kernel: 100 pages of kernel stacks Aug 20 11:20:23 newport kernel: 678 lowmem pagetables, 4620 highmem pagetables Aug 20 11:20:23 newport kernel: Free swap: 11712864kB Aug 20 11:20:23 newport kernel: 4194300 pages of RAM Aug 20 11:20:23 newport kernel: 1328796 free pages Aug 20 11:20:23 newport kernel: 279617 reserved pages Aug 20 11:20:23 newport kernel: 10220 pages shared Aug 20 11:20:23 newport kernel: 219278 pages swap cached Aug 20 11:20:23 newport kernel: Buffer memory: 2140kB Aug 20 11:20:23 newport kernel: Cache memory: 901688kB Aug 20 11:20:23 newport kernel: CLEAN: 346 buffers, 1375 kbyte, 25 used (last=343), 0 locked, 0 dirty 0 delay Aug 20 11:20:23 newport kernel: LOCKED: 4 buffers, 16 kbyte, 1 used (last=4), 1 locked, 0 dirty 0 delay Aug 20 11:20:23 newport kernel: DIRTY: 13 buffers, 52 kbyte, 0 used (last=0), 0 locked, 13 dirty 0 delay ################ Two process grabbing 3.7GB of memory each ######################################### [root@newport root]# vmstat --mb 1 2 procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy wa id 2 0 13 7677 2 24 0 0 530 2090 181 292 14 2 7 76 2 0 13 7677 2 24 0 0 0 0 114 14 50 0 0 50 No swapping Number of Processes 2 Memory Size 7400MB RM 100% WM 0% Run Time 600s Starting Load Generation Please Wait ... --- Results (MB/s)--- PROCESS #,Read Miss(MB/s),Write Miss(MB/s),Combined (MB/s) 0,3189.23,0,3189.23 1,2286.03,0,2286.03 --- Total --- 2,5475.25,0,5475.25 ---------------------- altsysrq M ------------------------------------ Aug 20 11:35:08 newport kernel: Mem-info: Aug 20 11:35:08 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:35:08 newport kernel: Zone:Normal freepages: 41891 min: 1279 low: 17406 high: 25597 Aug 20 11:35:08 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:35:08 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:35:08 newport kernel: Zone:Normal freepages:1008041 min: 1279 low: 17406 high: 25597 Aug 20 11:35:08 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:35:08 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:35:08 newport kernel: Zone:Normal freepages: 52064 min: 1279 low: 17406 high: 25597 Aug 20 11:35:08 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:35:08 newport kernel: Zone:DMA freepages: 2617 min: 1056 low: 1088 high: 1120 Aug 20 11:35:08 newport kernel: Zone:Normal freepages:860900 min: 1279 low: 17342 high: 25501 Aug 20 11:35:08 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:35:08 newport kernel: Free pages: 1965513 ( 0 HighMem) Aug 20 11:35:08 newport kernel: ( Active: 1900852/921, inactive_laundry: 433, inactive_clean: 934, free: 1965513 ) Aug 20 11:35:08 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:35:08 newport kernel: aa:946907 ac:1560 id:296 il:120 ic:7 fr:41891 Aug 20 11:35:08 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:35:08 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:35:08 newport kernel: aa:54 ac:932 id:1 il:0 ic:34 fr:1008041 Aug 20 11:35:08 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:35:08 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:35:08 newport kernel: aa:946615 ac:1473 id:622 il:97 ic:82 fr:52064 Aug 20 11:35:08 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:35:08 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:2617 Aug 20 11:35:08 newport kernel: aa:1008 ac:2303 id:2 il:216 ic:811 fr:860900 Aug 20 11:35:08 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:35:08 newport kernel: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 40*4096kB = 167564kB) Aug 20 11:35:08 newport kernel: Swap cache: add 8585237, delete 8580781, find 9284528/10986368, race 0+0 Aug 20 11:35:08 newport kernel: 31766 pages of slabcache Aug 20 11:35:08 newport kernel: 100 pages of kernel stacks Aug 20 11:35:08 newport kernel: 97 lowmem pagetables, 3901 highmem pagetables Aug 20 11:35:08 newport kernel: Free swap: 12571592kB Aug 20 11:35:09 newport kernel: 4194300 pages of RAM Aug 20 11:35:09 newport kernel: 1973933 free pages Aug 20 11:35:09 newport kernel: 279617 reserved pages Aug 20 11:35:09 newport kernel: 10217 pages shared Aug 20 11:35:09 newport kernel: 4456 pages swap cached Aug 20 11:35:09 newport kernel: Buffer memory: 2140kB Aug 20 11:35:09 newport kernel: Cache memory: 42436kB Aug 20 11:35:09 newport kernel: CLEAN: 357 buffers, 1419 kbyte, 25 used (last=340), 0 locked, 0 dirty 0 delay Aug 20 11:35:09 newport kernel: LOCKED: 4 buffers, 16 kbyte, 0 used (last=0), 0 locked, 0 dirty 0 delay ################ Four process grabbing 3.2GB of memory each ######################################### [root@newport root]# vmstat --mb 1 2 procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy wa id 5 0 22 2267 2 24 0 0 419 1663 149 231 19 2 6 73 4 0 22 2267 2 24 0 0 0 28 121 15 100 0 0 0 Causes lite swapping for 10min. Number of Processes 4 Memory Size 12800MB RM 100% WM 0% Run Time 600s Starting Load Generation Please Wait ... --- Results (MB/s)--- PROCESS #,Read Miss(MB/s),Write Miss(MB/s),Combined (MB/s) 0,1543.53,0,1543.53 1,1882.35,0,1882.35 2,2565.38,0,2565.38 3,1624.2,0,1624.2 --- Total --- 4,7615.46,0,7615.46 ---------------------- altsysrq M ------------------------------------ Aug 20 11:48:36 newport kernel: Mem-info: Aug 20 11:48:36 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:48:36 newport kernel: Zone:Normal freepages:170015 min: 1279 low: 17406 high: 25597 Aug 20 11:48:36 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:48:36 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:48:36 newport kernel: Zone:Normal freepages:187126 min: 1279 low: 17406 high: 25597 Aug 20 11:48:36 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:48:36 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:48:36 newport kernel: Zone:Normal freepages:180372 min: 1279 low: 17406 high: 25597 Aug 20 11:48:36 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:48:36 newport kernel: Zone:DMA freepages: 2617 min: 1056 low: 1088 high: 1120 Aug 20 11:48:36 newport kernel: Zone:Normal freepages: 40368 min: 1279 low: 17342 high: 25501 Aug 20 11:48:36 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 11:48:36 newport kernel: Free pages: 580498 ( 0 HighMem) Aug 20 11:48:36 newport kernel: ( Active: 3282381/2074, inactive_laundry: 556, inactive_clean: 728, free: 580498 ) Aug 20 11:48:36 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:48:36 newport kernel: aa:819115 ac:1572 id:236 il:58 ic:7 fr:170015 Aug 20 11:48:36 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:48:36 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:48:36 newport kernel: aa:818976 ac:939 id:246 il:122 ic:26 fr:187126 Aug 20 11:48:36 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:48:36 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:48:36 newport kernel: aa:818734 ac:1472 id:439 il:160 ic:23 fr:180372 Aug 20 11:48:36 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:48:36 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:2617 Aug 20 11:48:36 newport kernel: aa:819262 ac:2311 id:1153 il:216 ic:672 fr:40368 Aug 20 11:48:36 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 11:48:36 newport kernel: 1*4kB 1*8kB 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 166*4096kB = 680060kB) Aug 20 11:48:36 newport kernel: Swap cache: add 8622137, delete 8618116, find 9321464/11023459, race 0+0 Aug 20 11:48:36 newport kernel: 31817 pages of slabcache Aug 20 11:48:36 newport kernel: 104 pages of kernel stacks Aug 20 11:48:36 newport kernel: 1702 lowmem pagetables, 5006 highmem pagetables Aug 20 11:48:36 newport kernel: Free swap: 12573952kB Aug 20 11:48:36 newport kernel: 4194300 pages of RAM Aug 20 11:48:36 newport kernel: 588562 free pages Aug 20 11:48:36 newport kernel: 279617 reserved pages Aug 20 11:48:36 newport kernel: 10393 pages shared Aug 20 11:48:36 newport kernel: 4021 pages swap cached Aug 20 11:48:36 newport kernel: Buffer memory: 2148kB Aug 20 11:48:36 newport kernel: Cache memory: 40776kB Aug 20 11:48:36 newport kernel: CLEAN: 359 buffers, 1427 kbyte, 27 used (last=344), 0 locked, 0 dirty 0 delay Aug 20 11:48:36 newport kernel: LOCKED: 4 buffers, 16 kbyte, 0 used (last=0), 0 locked, 0 dirty 0 delay Aug 20 11:48:36 newport kernel: DIRTY: 1 buffers, 4 kbyte, 0 used (last=0), 0 locked, 1 dirty 0 delay ################ Four process grabbing 3GB of memory each ######################################### [root@newport root]# vmstat --mb 1 2 procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy wa id 4 0 13 3066 2 24 0 0 266 1069 105 148 40 1 4 55 4 0 13 3066 2 24 0 0 0 0 114 13 100 0 0 0 Causes no swapping and has same memory as 1 process @ 12GB Number of Processes 4 Memory Size 12000MB RM 100% WM 0% Run Time 600s Starting Load Generation Please Wait ... --- Results (MB/s)--- PROCESS #,Read Miss(MB/s),Write Miss(MB/s),Combined (MB/s) 0,2908.47,0,2908.47 1,2918.64,0,2918.64 2,2898.31,0,2898.31 3,2877.97,0,2877.97 --- Total --- 4,11603.4,0,11603.4 ---------------------- altsysrq M ------------------------------------ Aug 20 12:27:41 newport kernel: Mem-info: Aug 20 12:27:41 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 12:27:41 newport kernel: Zone:Normal freepages:221292 min: 1279 low: 17406 high: 25597 Aug 20 12:27:41 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 12:27:41 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 12:27:41 newport kernel: Zone:Normal freepages:238425 min: 1279 low: 17406 high: 25597 Aug 20 12:27:41 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 12:27:41 newport kernel: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Aug 20 12:27:41 newport kernel: Zone:Normal freepages:231158 min: 1279 low: 17406 high: 25597 Aug 20 12:27:41 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 12:27:41 newport kernel: Zone:DMA freepages: 2617 min: 1056 low: 1088 high: 1120 Aug 20 12:27:41 newport kernel: Zone:Normal freepages: 91635 min: 1279 low: 17342 high: 25501 Aug 20 12:27:41 newport kernel: Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Aug 20 12:27:41 newport kernel: Free pages: 785127 ( 0 HighMem) Aug 20 12:27:41 newport kernel: ( Active: 3079106/763, inactive_laundry: 351, inactive_clean: 727, free: 785127 ) Aug 20 12:27:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 12:27:41 newport kernel: aa:768197 ac:1573 id:2 il:12 ic:7 fr:221292 Aug 20 12:27:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 12:27:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 12:27:41 newport kernel: aa:768109 ac:945 id:2 il:29 ic:26 fr:238425 Aug 20 12:27:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 12:27:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 12:27:41 newport kernel: aa:768132 ac:1472 id:0 il:1 ic:23 fr:231158 Aug 20 12:27:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 12:27:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:2617 Aug 20 12:27:41 newport kernel: aa:768364 ac:2314 id:759 il:309 ic:671 fr:91635 Aug 20 12:27:41 newport kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:0 Aug 20 12:27:41 newport kernel: 0*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 216*4096kB = 885168kB) Aug 20 12:27:41 newport kernel: Swap cache: add 8705452, delete 8703307, find 9405054/11107049, race 0+0 Aug 20 12:27:41 newport kernel: 31868 pages of slabcache Aug 20 12:27:41 newport kernel: 104 pages of kernel stacks Aug 20 12:27:41 newport kernel: 1602 lowmem pagetables, 4706 highmem pagetables Aug 20 12:27:41 newport kernel: Free swap: 12581456kB Aug 20 12:27:41 newport kernel: 4194300 pages of RAM Aug 20 12:27:41 newport kernel: 793704 free pages Aug 20 12:27:41 newport kernel: 279617 reserved pages Aug 20 12:27:41 newport kernel: 10421 pages shared Aug 20 12:27:41 newport kernel: 2145 pages swap cached Aug 20 12:27:41 newport kernel: Buffer memory: 2148kB Aug 20 12:27:41 newport kernel: Cache memory: 33304kB Aug 20 12:27:41 newport kernel: CLEAN: 347 buffers, 1379 kbyte, 24 used (last=344), 0 locked, 0 dirty 0 delay Aug 20 12:27:41 newport kernel: LOCKED: 4 buffers, 16 kbyte, 1 used (last=4), 1 locked, 0 dirty 0 delay Aug 20 12:27:41 newport kernel: DIRTY: 16 buffers, 64 kbyte, 3 used (last=16), 0 locked, 16 dirty 0 delay
The main purpose of the test program is to validate that hardware and OS meet performance goals for application which require large amounts of memory for few processes. This is typical for Data Center /DB workloads.
OK, we see what the problem is here. There are 4 pgdata on this machine because it has 4 NUMA memory building blocks with 4MB on each. When a memory allocation happens, the _alloc_pages() routine in numa.c starts allocating on the pgdat corresponding to the CPU the process is running on. If the process does not get rescheduled on another CPU, the first 4GB will exhaust the memory in the first pgdat before moving on to the next pgdat. This basically means that for a 12GB memory allocation from a single process pgdat's 0, 1 and 2 will be exhausted(resulting in swapping) and pgdat 3 will be untouched. ******************************************************************** used >>>aa:901913 ac:1558 id:61553 il:11534 ic:6896 fr:10539 used >>>aa:867678 ac:879 id:100296 il:19880 ic:10183 fr:17178 used >>>aa:869084 ac:1449 id:100926 il:18792 ic:11396 fr:14445 free >>>aa:769 ac:2295 id:256 il:2 ic:825 fr:861136 ******************************************************************** I will work on a patch to alter the swaping algorithm to work with this uneven page allocation. Larry
Larry, Its, IMHO a good idea if we took out "K8 NUMA support" from the default kernel configuration. With the Opteron processor, there is not much of a performance benefit via the NUMA route. On the other hand, there have been problems. Apart from this swapping problem, we know that RHAS3 update2 assumes a 4-node configuration and wouldn't work with anything more than that. This got fixed in the update3-beta version to support upto 8 processors. Also, going forward, AMD has announced 8-way and 16-way configurations, so things will have to change again if that was to be supported. One other option (if the above is not viable) could be, for the allocator algorithm to not allocate beyond the pages_low watermark of a zone, if there are other zones (on other nodes) that can satisfy the request. This will keep kswapd happy. Please let us know if we can help in any way. Manpreet.
Passing "numa=off" as a commandline removes the swapping symptoms and a little above 14GB (out of 16GB in all) can be allocated to a process without swapping.
Created attachment 103970 [details] Patch against RHAS3 update3 - avoid premature swapping
Attached is a patch against RHAS3 update3 that allows allocation from non-local NUMA nodes if the local node is low on memory, while respecting the watermarks. This stops the undue swapping. It also avoids using the DMA zones if it was not the preferred zone (first in the zone list). It seems to be pretty stable for me. Please let me know what you think or if you have any questions.
A very reasonable work-around for this issue it to include "numa=off" on the boot command line. There is little or no measurable degradation when numa is turned off in this architecture and it will prevent swapping in one zone while other zones have plenty of free memory. Is this acceptable ? Larry Woodman
Apologies for a late response. We are currently using this workaround commandline. This is how we ascertained originally that the NUMA code had some issues. But a fix would be useful eventually since that would allow one to use both the NUMA functionality and to not have swapping problems. Users seem to want NUMA as it does make a performance difference.
It has been decided that x86_64 RHEL3 kernels should continue to enable NUMA by default. However, if an OOM kill occurs on a NUMA system, an extra message will be printed by the kernel suggesting that using the "numa=off" boot option might be a good way to work around the issue. The exact message is: OOM kill occurred on an x86_64 NUMA system! The numa=off boot option might help avoid this. This change was committed to the RHEL3 U5 patch pool on 9-Feb-2005 (in kernel version 2.4.21-27.12.EL).
A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.12.EL). To enable an improved NUMA-friendly page allocation policy, please set /proc/sys/vm/numa_memory_allocator via the "sysctl" command (or put "vm.numa_memory_allocator = 1" in /etc/sysctl.conf).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html