Bug 130387 - Processes with Large memory requirment causes swap usage with free memory is present.
Processes with Large memory requirment causes swap usage with free memory is ...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
:
Depends On:
Blocks: 168424
  Show dependency treegraph
 
Reported: 2004-08-19 16:37 EDT by Kevin Babst
Modified: 2016-09-06 12:42 EDT (History)
6 users (show)

See Also:
Fixed In Version: RHSA-2006-0144
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-15 10:37:40 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch against RHAS3 update3 - avoid premature swapping (7.02 KB, patch)
2004-09-17 21:24 EDT, Manpreet Singh
no flags Details | Diff

  None (edit)
Description Kevin Babst 2004-08-19 16:37:19 EDT
Description of problem:
A single 64bit process can not malloc and touch more than 4GB of
memory with out causing the system to use swap, even though there is
still free memory present.

System: AMD Quartet
Memory: 16GB of Physical memory (4GB per-processor)
Processors: 4 - AMD 844
Disk:  18GB SCSI

Number of Test processes:   1, 2, 3, 4
Max Memory per-process 
before swapping occured:    3.9GB,  3.7GB, 3.7GB, 3.25GB
Total memory used by all
Processes:                  3.9GB, 7.4GB, 11.1GB, 13GB
Free Memory:               ~11GB,  ~6GB,  ~3GB,  ~1.5GB

This same behavior was seen on Opteron system with 64GB Main Memory.

Version-Release number of selected component (if applicable):
Linux 2.4.21-17.ELsmp #1 SMP Thu Jul 8 19:30:27 EDT 2004 x86_64 GNU/Linux

How reproducible:
Every Time

Steps to Reproduce:
1. Create a process that mallocs greater than 4GB of Ram.
2. Have the process memset the malloced array, then either read or
with array in a loop.
3. What the swap file usage increase.
  
Additional info:
Test program avalible by request (kevin@fabric7.com)

Run as root or other user
Comment 1 Larry Woodman 2004-08-20 09:33:52 EDT
This is likely normal behavior.  The linux kernel splits memory into
zones for 1.) DMAable memory 2.) Normal or kernel mapped memory and
3.) Highmem or memory that is physically larger than the system can
map with in a virtual address space.  This means that it is quite
possible to exhaust one zone long before the other zones are
exhausted, resulting in page reclamation and therefore paging/swap
usage.  Also, the system can allocate pages of swap space for a
process to use long before it actually swaps pages out and this still
shows up as swap usage.  In addition, it is also perfectly normal for
the system to run out of memory and start page reclamation even though
you might not expect it.  We use pages of memory to cache every file
system data and meta-data item that ever used until the memory is
exhausted then we start reclaiming those pages of memory is least
recently used order.  If pages that need to be swapped out are the
oldest we swap them out.

Having said that, is this causing a problem on your system or is this
just behavior that you didnt expect to see?

Anyway, please reproduce this problem you describe and get me several
"AltSysrq M" outputs so I can see the exact memory state of your
system and determine whether this is a normal situation or there is
some kernel/memory problem that is unusual.

Thanks, Larry Woodman
Comment 2 Kevin Babst 2004-08-20 15:38:23 EDT
System information:

Processor Info:

total Processors : 4
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 5
model name      : AMD Opteron(tm) Processor 844
physical id     : 0
siblings        : 1
stepping        : 1
cpu MHz         : 1792.860
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse
sse2 syscall nx mmxext lm 3dnowext 3dnow
bogomips        : 3578.26
TLB size        : 1088 4K pages
clflush size    : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp

Memory Info
        total:    used:    free:  shared: buffers:  cached:
Mem:  16034541568 217554944 15816986624        0  2199552 34111488
Swap: 12897230848 13819904 12883410944
MemTotal:     15658732 kB
MemFree:      15446276 kB
MemShared:           0 kB
Buffers:          2148 kB
Cached:          24728 kB
SwapCached:       8584 kB
Active:          31080 kB
ActiveAnon:       5740 kB
ActiveCache:     25340 kB
Inact_dirty:       292 kB
Inact_laundry:    1460 kB
Inact_clean:      2900 kB
Inact_target:     7144 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     15658732 kB
LowFree:      15446276 kB
SwapTotal:    12594952 kB
SwapFree:     12581456 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB

Kernel Stock RHAS3 Update3 beta:
[root@newport root]# uname -a
Linux newport 2.4.21-17.ELsmp #1 SMP Thu Jul 8 19:30:27 EDT 2004
x86_64 x86_64 x86_64 GNU/Linux

<<< Test Info >>>

################ One process grabbing 12GB of memory
#########################################
Causes Heavey swapping for 10min.

Number of Processes 1
Memory Size 12000MB
RM 100%
WM 0%
Run Time 600s
Starting Load Generation
Please Wait ...
--- Results (MB/s)---
PROCESS #,Read Miss(MB/s),Write Miss(MB/s),Combined (MB/s)
0,83.9161,0,83.9161
--- Total ---
1,83.9161,0,83.9161

---------------------- altsysrq M ------------------------------------
Aug 20 10:50:33 newport kernel: Mem-info:
Aug 20 10:50:33 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 10:50:33 newport kernel: Zone:Normal freepages: 10539 min: 
1279 low: 17406 high: 25597
Aug 20 10:50:33 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 10:50:33 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 10:50:33 newport kernel: Zone:Normal freepages: 17178 min: 
1279 low: 17406 high: 25597
Aug 20 10:50:33 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 10:50:33 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 10:50:33 newport kernel: Zone:Normal freepages: 14445 min: 
1279 low: 17406 high: 25597
Aug 20 10:50:33 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 10:50:33 newport kernel: Zone:DMA freepages:  2617 min:  1056
low:  1088 high:  1120
Aug 20 10:50:41 newport kernel: Zone:Normal freepages:861136 min: 
1279 low: 17342 high: 25501
Aug 20 10:50:41 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 10:50:41 newport kernel: Free pages:      905915 (     0 HighMem)
Aug 20 10:50:41 newport kernel: ( Active: 2645625/263031,
inactive_laundry: 50208, inactive_clean: 29300, free: 905915 )
Aug 20 10:50:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 10:50:41 newport kernel:   aa:901913 ac:1558 id:61553 il:11534
ic:6896 fr:10539
Aug 20 10:50:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 10:50:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 10:50:41 newport kernel:   aa:867678 ac:879 id:100296 il:19880
ic:10183 fr:17178
Aug 20 10:50:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 10:50:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 10:50:41 newport kernel:   aa:869084 ac:1449 id:100926 il:18792
ic:11396 fr:14445
Aug 20 10:50:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 10:50:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:2617
Aug 20 10:50:41 newport kernel:   aa:769 ac:2295 id:256 il:2 ic:825
fr:861136
Aug 20 10:50:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 10:50:41 newport kernel: 1*4kB 1*8kB 0*16kB 1*32kB 0*64kB
1*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 10*4096kB = 42156kB)
Aug 20 10:50:41 newport kernel: Swap cache: add 2633120, delete
2553093, find 1375365/1891330, race 0+0
Aug 20 10:50:41 newport kernel: 4380 pages of slabcache
Aug 20 10:50:41 newport kernel: 94 pages of kernel stacks
Aug 20 10:50:41 newport kernel: 96 lowmem pagetables, 5987 highmem
pagetables
Aug 20 10:50:41 newport kernel: Free swap:       12273992kB
Aug 20 10:50:41 newport kernel: 4194300 pages of RAM
Aug 20 10:50:41 newport kernel: 913080 free pages
Aug 20 10:50:41 newport kernel: 279617 reserved pages
Aug 20 10:50:41 newport kernel: 9345 pages shared
Aug 20 10:50:41 newport kernel: 83734 pages swap cached
Aug 20 10:50:41 newport kernel: Buffer memory:     2132kB
Aug 20 10:50:41 newport kernel: Cache memory:   369544kB
Aug 20 10:50:41 newport kernel:   CLEAN: 355 buffers, 1411 kbyte, 25
used (last=294), 0 locked, 0 dirty 0 delay
Aug 20 10:50:41 newport kernel:  LOCKED: 1 buffers, 4 kbyte, 1 used
(last=1), 1 locked, 0 dirty 0 delay


################ One process grabbing 3.9GB of memory
#########################################
No swapping occured

Number of Processes 1
Memory Size 3900MB
RM 100%
WM 0%
Run Time 600s
Starting Load Generation
Please Wait ...
--- Results (MB/s)---
PROCESS #,Read Miss(MB/s),Write Miss(MB/s),Combined (MB/s)
0,2343.94,0,2343.94
--- Total ---
1,2343.94,0,2343.94

---------------------- altsysrq M ------------------------------------
Aug 20 11:07:02 newport kernel: Mem-info:
Aug 20 11:07:02 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 11:07:02 newport kernel: Zone:Normal freepages:977713 min: 
1279 low: 17406 high: 25597
Aug 20 11:07:02 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:07:02 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 11:07:02 newport kernel: Zone:Normal freepages:990802 min: 
1279 low: 17406 high: 25597
Aug 20 11:07:02 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:07:02 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 11:07:02 newport kernel: Zone:Normal freepages:776616 min: 
1279 low: 17406 high: 25597
Aug 20 11:07:02 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:07:02 newport kernel: Zone:DMA freepages:  2617 min:  1056
low:  1088 high:  1120
Aug 20 11:07:02 newport kernel: Zone:Normal freepages: 77148 min: 
1279 low: 17342 high: 25501
Aug 20 11:07:02 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:07:02 newport kernel: Free pages:      2824896 (     0 HighMem)
Aug 20 11:07:02 newport kernel: ( Active: 1004695/1513,
inactive_laundry: 58, inactive_clean: 811, free: 2824896 )
Aug 20 11:07:02 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:07:02 newport kernel:   aa:108 ac:1560 id:2 il:8 ic:0 fr:977713
Aug 20 11:07:02 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:07:02 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:07:02 newport kernel:   aa:187 ac:876 id:0 il:50 ic:0 fr:990802
Aug 20 11:07:02 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:07:02 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:07:02 newport kernel:   aa:237938 ac:1473 id:1 il:0 ic:0
fr:776616
Aug 20 11:07:02 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:07:02 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:2617
Aug 20 11:07:02 newport kernel:   aa:760251 ac:2302 id:1510 il:0
ic:811 fr:77148
Aug 20 11:07:02 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:07:02 newport kernel: 4637*4kB 4416*8kB 4331*16kB 3709*32kB
1998*64kB 705*128kB 98*256kB 21*512kB 25*1024kB 15*2048kB 820*4096kB =
3910852kB)
Aug 20 11:07:02 newport kernel: Swap cache: add 6198833, delete
6197032, find 2441240/4135898, race 0+0
Aug 20 11:07:02 newport kernel: 70911 pages of slabcache
Aug 20 11:07:02 newport kernel: 98 pages of kernel stacks
Aug 20 11:07:02 newport kernel: 1583 lowmem pagetables, 660 highmem
pagetables
Aug 20 11:07:02 newport kernel: Free swap:       12582772kB
Aug 20 11:07:03 newport kernel: 4194300 pages of RAM
Aug 20 11:07:03 newport kernel: 2832614 free pages
Aug 20 11:07:03 newport kernel: 279617 reserved pages
Aug 20 11:07:03 newport kernel: 10135 pages shared
Aug 20 11:07:03 newport kernel: 1801 pages swap cached
Aug 20 11:07:03 newport kernel: Buffer memory:     2140kB
Aug 20 11:07:03 newport kernel: Cache memory:    31328kB
Aug 20 11:07:03 newport kernel:   CLEAN: 358 buffers, 1423 kbyte, 25
used (last=354), 0 locked, 0 dirty 0 delay
Aug 20 11:07:03 newport kernel:  LOCKED: 4 buffers, 16 kbyte, 0 used
(last=0), 0 locked, 0 dirty 0 delay



################ Two process grabbing 5GB of memory each
#########################################

[root@newport root]# vmstat --mb 1 2
procs                      memory      swap          io     system   
     cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy wa id
 2  0   6149   5070      2     24    0    0   670  2370  218   366 10
 2  9 79
 2  0   6151   5070      2     24    0    0     0  1288  114    18 50
 2  0 48

Causes lite swapping for 10min.

Number of Processes 2
Memory Size 10000MB
RM 100%
WM 0%
Run Time 600s
Starting Load Generation
Please Wait ...
--- Results (MB/s)---
PROCESS #,Read Miss(MB/s),Write Miss(MB/s),Combined (MB/s)
0,1782.09,0,1782.09
1,1825.13,0,1825.13
--- Total ---
2,3607.22,0,3607.22


---------------------- altsysrq M ------------------------------------

Aug 20 11:20:23 newport kernel: Mem-info:
Aug 20 11:20:23 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 11:20:23 newport kernel: Zone:Normal freepages:716141 min: 
1279 low: 17406 high: 25597
Aug 20 11:20:23 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:20:23 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 11:20:23 newport kernel: Zone:Normal freepages:  4432 min: 
1279 low: 17406 high: 25597
Aug 20 11:20:23 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:20:23 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 11:20:23 newport kernel: Zone:Normal freepages: 33446 min: 
1279 low: 17406 high: 25597
Aug 20 11:20:23 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:20:23 newport kernel: Zone:DMA freepages:  2617 min:  1056
low:  1088 high:  1120
Aug 20 11:20:23 newport kernel: Zone:Normal freepages:564232 min: 
1279 low: 17342 high: 25501
Aug 20 11:20:23 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:20:23 newport kernel: Free pages:      1320868 (     0 HighMem)
Aug 20 11:20:23 newport kernel: ( Active: 2397135/144404,
inactive_laundry: 19748, inactive_clean: 7583, free: 1320868 )
Aug 20 11:20:23 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:20:23 newport kernel:   aa:277637 ac:1561 id:5 il:0 ic:7
fr:716141
Aug 20 11:20:23 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:20:23 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:20:23 newport kernel:   aa:879752 ac:878 id:97543 il:18893
ic:6626 fr:4432
Aug 20 11:20:23 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:20:23 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:20:23 newport kernel:   aa:935376 ac:1473 id:46856 il:638
ic:139 fr:33446
Aug 20 11:20:23 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:20:23 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:2617
Aug 20 11:20:23 newport kernel:   aa:298157 ac:2301 id:0 il:217 ic:811
fr:564232
Aug 20 11:20:23 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:20:23 newport kernel: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB
1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 699*4096kB = 2864564kB)
Aug 20 11:20:23 newport kernel: Swap cache: add 6431175, delete
6211897, find 2646240/4340898, race 0+0
Aug 20 11:20:23 newport kernel: 9871 pages of slabcache
Aug 20 11:20:23 newport kernel: 100 pages of kernel stacks
Aug 20 11:20:23 newport kernel: 678 lowmem pagetables, 4620 highmem
pagetables
Aug 20 11:20:23 newport kernel: Free swap:       11712864kB
Aug 20 11:20:23 newport kernel: 4194300 pages of RAM
Aug 20 11:20:23 newport kernel: 1328796 free pages
Aug 20 11:20:23 newport kernel: 279617 reserved pages
Aug 20 11:20:23 newport kernel: 10220 pages shared
Aug 20 11:20:23 newport kernel: 219278 pages swap cached
Aug 20 11:20:23 newport kernel: Buffer memory:     2140kB
Aug 20 11:20:23 newport kernel: Cache memory:   901688kB
Aug 20 11:20:23 newport kernel:   CLEAN: 346 buffers, 1375 kbyte, 25
used (last=343), 0 locked, 0 dirty 0 delay
Aug 20 11:20:23 newport kernel:  LOCKED: 4 buffers, 16 kbyte, 1 used
(last=4), 1 locked, 0 dirty 0 delay
Aug 20 11:20:23 newport kernel:   DIRTY: 13 buffers, 52 kbyte, 0 used
(last=0), 0 locked, 13 dirty 0 delay


################ Two process grabbing 3.7GB of memory each
#########################################
[root@newport root]# vmstat --mb 1 2
procs                      memory      swap          io     system   
     cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy wa id
 2  0     13   7677      2     24    0    0   530  2090  181   292 14
 2  7 76
 2  0     13   7677      2     24    0    0     0     0  114    14 50
 0  0 50
 
No swapping

Number of Processes 2
Memory Size 7400MB
RM 100%
WM 0%
Run Time 600s
Starting Load Generation
Please Wait ...
--- Results (MB/s)---
PROCESS #,Read Miss(MB/s),Write Miss(MB/s),Combined (MB/s)
0,3189.23,0,3189.23
1,2286.03,0,2286.03
--- Total ---
2,5475.25,0,5475.25


---------------------- altsysrq M ------------------------------------
Aug 20 11:35:08 newport kernel: Mem-info:
Aug 20 11:35:08 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 11:35:08 newport kernel: Zone:Normal freepages: 41891 min: 
1279 low: 17406 high: 25597
Aug 20 11:35:08 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:35:08 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 11:35:08 newport kernel: Zone:Normal freepages:1008041 min: 
1279 low: 17406 high: 25597
Aug 20 11:35:08 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:35:08 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 11:35:08 newport kernel: Zone:Normal freepages: 52064 min: 
1279 low: 17406 high: 25597
Aug 20 11:35:08 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:35:08 newport kernel: Zone:DMA freepages:  2617 min:  1056
low:  1088 high:  1120
Aug 20 11:35:08 newport kernel: Zone:Normal freepages:860900 min: 
1279 low: 17342 high: 25501
Aug 20 11:35:08 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:35:08 newport kernel: Free pages:      1965513 (     0 HighMem)
Aug 20 11:35:08 newport kernel: ( Active: 1900852/921,
inactive_laundry: 433, inactive_clean: 934, free: 1965513 )
Aug 20 11:35:08 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:35:08 newport kernel:   aa:946907 ac:1560 id:296 il:120 ic:7
fr:41891
Aug 20 11:35:08 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:35:08 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:35:08 newport kernel:   aa:54 ac:932 id:1 il:0 ic:34 fr:1008041
Aug 20 11:35:08 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:35:08 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:35:08 newport kernel:   aa:946615 ac:1473 id:622 il:97 ic:82
fr:52064
Aug 20 11:35:08 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:35:08 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:2617
Aug 20 11:35:08 newport kernel:   aa:1008 ac:2303 id:2 il:216 ic:811
fr:860900
Aug 20 11:35:08 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:35:08 newport kernel: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB
1*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 40*4096kB = 167564kB)
Aug 20 11:35:08 newport kernel: Swap cache: add 8585237, delete
8580781, find 9284528/10986368, race 0+0
Aug 20 11:35:08 newport kernel: 31766 pages of slabcache
Aug 20 11:35:08 newport kernel: 100 pages of kernel stacks
Aug 20 11:35:08 newport kernel: 97 lowmem pagetables, 3901 highmem
pagetables
Aug 20 11:35:08 newport kernel: Free swap:       12571592kB
Aug 20 11:35:09 newport kernel: 4194300 pages of RAM
Aug 20 11:35:09 newport kernel: 1973933 free pages
Aug 20 11:35:09 newport kernel: 279617 reserved pages
Aug 20 11:35:09 newport kernel: 10217 pages shared
Aug 20 11:35:09 newport kernel: 4456 pages swap cached
Aug 20 11:35:09 newport kernel: Buffer memory:     2140kB
Aug 20 11:35:09 newport kernel: Cache memory:    42436kB
Aug 20 11:35:09 newport kernel:   CLEAN: 357 buffers, 1419 kbyte, 25
used (last=340), 0 locked, 0 dirty 0 delay
Aug 20 11:35:09 newport kernel:  LOCKED: 4 buffers, 16 kbyte, 0 used
(last=0), 0 locked, 0 dirty 0 delay



################ Four process grabbing 3.2GB of memory each
#########################################
[root@newport root]# vmstat --mb 1 2
procs                      memory      swap          io     system   
     cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy wa id
 5  0     22   2267      2     24    0    0   419  1663  149   231 19
 2  6 73
 4  0     22   2267      2     24    0    0     0    28  121    15 100
 0  0  0

Causes lite swapping for 10min.

Number of Processes 4
Memory Size 12800MB
RM 100%
WM 0%
Run Time 600s
Starting Load Generation
Please Wait ...
--- Results (MB/s)---
PROCESS #,Read Miss(MB/s),Write Miss(MB/s),Combined (MB/s)
0,1543.53,0,1543.53
1,1882.35,0,1882.35
2,2565.38,0,2565.38
3,1624.2,0,1624.2
--- Total ---
4,7615.46,0,7615.46

---------------------- altsysrq M ------------------------------------
Aug 20 11:48:36 newport kernel: Mem-info:
Aug 20 11:48:36 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 11:48:36 newport kernel: Zone:Normal freepages:170015 min: 
1279 low: 17406 high: 25597
Aug 20 11:48:36 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:48:36 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 11:48:36 newport kernel: Zone:Normal freepages:187126 min: 
1279 low: 17406 high: 25597
Aug 20 11:48:36 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:48:36 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 11:48:36 newport kernel: Zone:Normal freepages:180372 min: 
1279 low: 17406 high: 25597
Aug 20 11:48:36 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:48:36 newport kernel: Zone:DMA freepages:  2617 min:  1056
low:  1088 high:  1120
Aug 20 11:48:36 newport kernel: Zone:Normal freepages: 40368 min: 
1279 low: 17342 high: 25501
Aug 20 11:48:36 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 11:48:36 newport kernel: Free pages:      580498 (     0 HighMem)
Aug 20 11:48:36 newport kernel: ( Active: 3282381/2074,
inactive_laundry: 556, inactive_clean: 728, free: 580498 )
Aug 20 11:48:36 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:48:36 newport kernel:   aa:819115 ac:1572 id:236 il:58 ic:7
fr:170015
Aug 20 11:48:36 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:48:36 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:48:36 newport kernel:   aa:818976 ac:939 id:246 il:122 ic:26
fr:187126
Aug 20 11:48:36 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:48:36 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:48:36 newport kernel:   aa:818734 ac:1472 id:439 il:160
ic:23 fr:180372
Aug 20 11:48:36 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:48:36 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:2617
Aug 20 11:48:36 newport kernel:   aa:819262 ac:2311 id:1153 il:216
ic:672 fr:40368
Aug 20 11:48:36 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 11:48:36 newport kernel: 1*4kB 1*8kB 1*16kB 1*32kB 1*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 166*4096kB = 680060kB)
Aug 20 11:48:36 newport kernel: Swap cache: add 8622137, delete
8618116, find 9321464/11023459, race 0+0
Aug 20 11:48:36 newport kernel: 31817 pages of slabcache
Aug 20 11:48:36 newport kernel: 104 pages of kernel stacks
Aug 20 11:48:36 newport kernel: 1702 lowmem pagetables, 5006 highmem
pagetables
Aug 20 11:48:36 newport kernel: Free swap:       12573952kB
Aug 20 11:48:36 newport kernel: 4194300 pages of RAM
Aug 20 11:48:36 newport kernel: 588562 free pages
Aug 20 11:48:36 newport kernel: 279617 reserved pages
Aug 20 11:48:36 newport kernel: 10393 pages shared
Aug 20 11:48:36 newport kernel: 4021 pages swap cached
Aug 20 11:48:36 newport kernel: Buffer memory:     2148kB
Aug 20 11:48:36 newport kernel: Cache memory:    40776kB
Aug 20 11:48:36 newport kernel:   CLEAN: 359 buffers, 1427 kbyte, 27
used (last=344), 0 locked, 0 dirty 0 delay
Aug 20 11:48:36 newport kernel:  LOCKED: 4 buffers, 16 kbyte, 0 used
(last=0), 0 locked, 0 dirty 0 delay
Aug 20 11:48:36 newport kernel:   DIRTY: 1 buffers, 4 kbyte, 0 used
(last=0), 0 locked, 1 dirty 0 delay



################ Four process grabbing 3GB of memory each
#########################################
[root@newport root]# vmstat --mb 1 2

procs                      memory      swap          io     system   
     cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy wa id
 4  0     13   3066      2     24    0    0   266  1069  105   148 40
 1  4 55
 4  0     13   3066      2     24    0    0     0     0  114    13 100
 0  0  0
 
 Causes no swapping and has same memory as 1 process @ 12GB
 
 Number of Processes 4
 Memory Size 12000MB
 RM 100%
 WM 0%
 Run Time 600s
 Starting Load Generation
 Please Wait ...
 --- Results (MB/s)---
 PROCESS #,Read Miss(MB/s),Write Miss(MB/s),Combined (MB/s)
 0,2908.47,0,2908.47
 1,2918.64,0,2918.64
 2,2898.31,0,2898.31
 3,2877.97,0,2877.97
 --- Total ---
4,11603.4,0,11603.4
 ---------------------- altsysrq M ------------------------------------
Aug 20 12:27:41 newport kernel: Mem-info:
Aug 20 12:27:41 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 12:27:41 newport kernel: Zone:Normal freepages:221292 min: 
1279 low: 17406 high: 25597
Aug 20 12:27:41 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 12:27:41 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 12:27:41 newport kernel: Zone:Normal freepages:238425 min: 
1279 low: 17406 high: 25597
Aug 20 12:27:41 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 12:27:41 newport kernel: Zone:DMA freepages:     0 min:     0
low:     0 high:     0
Aug 20 12:27:41 newport kernel: Zone:Normal freepages:231158 min: 
1279 low: 17406 high: 25597
Aug 20 12:27:41 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 12:27:41 newport kernel: Zone:DMA freepages:  2617 min:  1056
low:  1088 high:  1120
Aug 20 12:27:41 newport kernel: Zone:Normal freepages: 91635 min: 
1279 low: 17342 high: 25501
Aug 20 12:27:41 newport kernel: Zone:HighMem freepages:     0 min:   
 0 low:     0 high:     0
Aug 20 12:27:41 newport kernel: Free pages:      785127 (     0 HighMem)
Aug 20 12:27:41 newport kernel: ( Active: 3079106/763,
inactive_laundry: 351, inactive_clean: 727, free: 785127 )
Aug 20 12:27:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 12:27:41 newport kernel:   aa:768197 ac:1573 id:2 il:12 ic:7
fr:221292
Aug 20 12:27:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 12:27:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 12:27:41 newport kernel:   aa:768109 ac:945 id:2 il:29 ic:26
fr:238425
Aug 20 12:27:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 12:27:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 12:27:41 newport kernel:   aa:768132 ac:1472 id:0 il:1 ic:23
fr:231158
Aug 20 12:27:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 12:27:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:2617
Aug 20 12:27:41 newport kernel:   aa:768364 ac:2314 id:759 il:309
ic:671 fr:91635
Aug 20 12:27:41 newport kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:0
Aug 20 12:27:41 newport kernel: 0*4kB 0*8kB 1*16kB 1*32kB 0*64kB
1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 216*4096kB = 885168kB)
Aug 20 12:27:41 newport kernel: Swap cache: add 8705452, delete
8703307, find 9405054/11107049, race 0+0
Aug 20 12:27:41 newport kernel: 31868 pages of slabcache
Aug 20 12:27:41 newport kernel: 104 pages of kernel stacks
Aug 20 12:27:41 newport kernel: 1602 lowmem pagetables, 4706 highmem
pagetables
Aug 20 12:27:41 newport kernel: Free swap:       12581456kB
Aug 20 12:27:41 newport kernel: 4194300 pages of RAM
Aug 20 12:27:41 newport kernel: 793704 free pages
Aug 20 12:27:41 newport kernel: 279617 reserved pages
Aug 20 12:27:41 newport kernel: 10421 pages shared
Aug 20 12:27:41 newport kernel: 2145 pages swap cached
Aug 20 12:27:41 newport kernel: Buffer memory:     2148kB
Aug 20 12:27:41 newport kernel: Cache memory:    33304kB
Aug 20 12:27:41 newport kernel:   CLEAN: 347 buffers, 1379 kbyte, 24
used (last=344), 0 locked, 0 dirty 0 delay
Aug 20 12:27:41 newport kernel:  LOCKED: 4 buffers, 16 kbyte, 1 used
(last=4), 1 locked, 0 dirty 0 delay
Aug 20 12:27:41 newport kernel:   DIRTY: 16 buffers, 64 kbyte, 3 used
(last=16), 0 locked, 16 dirty 0 delay
Comment 3 Kevin Babst 2004-08-20 16:43:47 EDT
The main purpose of the test program is to validate that hardware and
OS meet performance goals for application which require large amounts
of memory for few processes. This is typical for Data Center /DB
workloads.
Comment 4 Larry Woodman 2004-08-20 17:26:11 EDT
OK, we see what the problem is here.  There are 4 pgdata on this
machine because it has 4 NUMA memory building blocks with 4MB on each.
 When a memory allocation happens, the _alloc_pages() routine in
numa.c starts allocating on the pgdat corresponding to the CPU the
process is running on.  If the process does not get rescheduled on
another CPU, the first 4GB will exhaust the memory in the first pgdat
before moving on to the next pgdat.  This basically means that for a
12GB memory allocation from a single process pgdat's 0, 1 and 2 will
be exhausted(resulting in swapping) and pgdat 3 will be untouched.

********************************************************************
used >>>aa:901913 ac:1558 id:61553 il:11534 ic:6896 fr:10539
used >>>aa:867678 ac:879 id:100296 il:19880 ic:10183 fr:17178
used >>>aa:869084 ac:1449 id:100926 il:18792 ic:11396 fr:14445
free >>>aa:769 ac:2295 id:256 il:2 ic:825 fr:861136
********************************************************************

I will work on a patch to alter the swaping algorithm to work with
this uneven page allocation.

Larry

Comment 5 Manpreet Singh 2004-08-20 22:03:42 EDT
Larry,

Its, IMHO a good idea if we took out "K8 NUMA support" from the
default kernel configuration.

With the Opteron processor, there is not much of a performance benefit
 via the NUMA route. On the other hand, there have been problems.
Apart from this swapping problem, we know that RHAS3 update2 assumes a
4-node configuration and wouldn't work with anything more than that.
This got fixed in the update3-beta version to support upto 8
processors. Also, going forward, AMD has announced 8-way and 16-way
configurations, so things will have to change again if that was to be
supported.

One other option (if the above is not viable) could be, for the
allocator algorithm to not allocate beyond the pages_low watermark of
a zone, if there are other zones (on other nodes) that can satisfy the
request. This will keep kswapd happy.

Please let us know if we can help in any way.

Manpreet.

Comment 6 Manpreet Singh 2004-08-20 22:07:03 EDT
Passing "numa=off" as a commandline removes the swapping symptoms and
a little above 14GB (out of 16GB in all) can be allocated to a process
without swapping.
Comment 7 Manpreet Singh 2004-09-17 21:24:59 EDT
Created attachment 103970 [details]
Patch against RHAS3 update3 - avoid premature swapping
Comment 8 Manpreet Singh 2004-09-17 21:27:00 EDT
Attached is a patch against RHAS3 update3 that allows allocation from
non-local NUMA nodes if the local node is low on memory, while 
respecting the watermarks. This stops the undue swapping.

It also avoids using the DMA zones if it was not the preferred zone
(first in the zone list).

It seems to be pretty stable for me. Please let me know what you think
or if you have any questions.
Comment 10 Larry Woodman 2004-11-11 19:42:17 EST
A very reasonable work-around for this issue it to include "numa=off"
on the boot command line.  There is little or no measurable
degradation when numa is turned off in this architecture and it will
prevent swapping in one zone while other zones have plenty of free memory.

Is this acceptable ?


Larry Woodman
Comment 11 Manpreet Singh 2004-12-01 17:38:08 EST
Apologies for a late response. We are currently using this workaround
commandline. This is how we ascertained originally that the NUMA code
had some issues. But a fix would be useful eventually since that would
allow one to use both the NUMA functionality and to not have swapping
problems. Users seem to want NUMA as it does make a performance
difference.
Comment 14 Ernie Petrides 2005-02-15 00:14:35 EST
It has been decided that x86_64 RHEL3 kernels should continue to enable
NUMA by default.  However, if an OOM kill occurs on a NUMA system, an
extra message will be printed by the kernel suggesting that using the
"numa=off" boot option might be a good way to work around the issue.

The exact message is:

    OOM kill occurred on an x86_64 NUMA system!
    The numa=off boot option might help avoid this.

This change was committed to the RHEL3 U5 patch pool on 9-Feb-2005 (in
kernel version 2.4.21-27.12.EL).
Comment 15 Ernie Petrides 2005-11-30 02:44:29 EST
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.12.EL).

To enable an improved NUMA-friendly page allocation policy, please
set /proc/sys/vm/numa_memory_allocator via the "sysctl" command
(or put "vm.numa_memory_allocator = 1" in /etc/sysctl.conf).
Comment 19 Red Hat Bugzilla 2006-03-15 10:37:40 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html

Note You need to log in before you can comment on or make changes to this bug.