From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Description of problem: I have an application that allocates about 30 shared memory segments totaling about 450MB of memory. This application is running on a box with 4GB of memory and no swap. The application has hundreds of forked children that attach to these shared memory segments and use them. When this configuration is running, and I then run mke2fs on a partition, the mke2fs slows to a stop and the kernel starts killing my application processes. While it's killing these processes, I have top running and it shows anywhere from 1-2GB of free memory and 1-2GB of memory listed as 'cached'. The 'buffers' sit around 600K. The only other thing I've changed is vm.vfs_cache_pressure, which is set to 1000. Raising or lowering this value doesn't change the behavior under this scenario. The one difference I do note is that under top, each process lists its 'VIRT' around 450M (the size of the shm attached). Normally, the 'SHR' field will also be this value since all of the memory is shared, and the 'RES' field will be smaller. Under the new kernel, the VIRT field remains 450M, but the SHR field is a few MB at most, and is close to the RES field. It almost appears as if the shared memory either isn't maintaining its sharedness or the 'SHR' column data has changed meaning. mke2fs runs fine without the shared memory allocation. The application runs fine without mke2fs running. This exact same config works fine under 2.6.9-1.6FC2 and all previous kernels back to RH9. I'm placing this under 'kernel' since it really looks like the kernel has some serious memory management issues, especially when a huge amount of disk writing takes place very quickly. I'll try to post a dumbed down test program that can be used to recreate the behavior. Version-Release number of selected component (if applicable): kernel-2.6.10-1.9_FC2 How reproducible: Always Steps to Reproduce: 1. create large amount of shared memory 2. attach to it with a large number of processes 3. run mke2fs on a decent sized partition Actual Results: Box comes to halt. OOM manager begins killing processes attached to shared memory, even though free memory exists on system. Expected Results: System hums along not caring. Additional info:
I forgot one key part to this. The box is under heavy I/O load constantly (think news server, but heavier), and that seems to be another component. It's still puzzling why processes are being killed when free memory is available.
Here's the output from the OOM manager: oom-killer: gfp_mask=0xd0 Mem-info: DMA per-cpu: cpu 0 hot: low 2, high 6, batch 1 cpu 0 cold: low 0, high 2, batch 1 cpu 1 hot: low 2, high 6, batch 1 cpu 1 cold: low 0, high 2, batch 1 cpu 2 hot: low 2, high 6, batch 1 cpu 2 cold: low 0, high 2, batch 1 cpu 3 hot: low 2, high 6, batch 1 cpu 3 cold: low 0, high 2, batch 1 Normal per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 cpu 1 hot: low 32, high 96, batch 16 cpu 1 cold: low 0, high 32, batch 16 cpu 2 hot: low 32, high 96, batch 16 cpu 2 cold: low 0, high 32, batch 16 cpu 3 hot: low 32, high 96, batch 16 cpu 3 cold: low 0, high 32, batch 16 HighMem per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 cpu 1 hot: low 32, high 96, batch 16 cpu 1 cold: low 0, high 32, batch 16 cpu 2 hot: low 32, high 96, batch 16 cpu 2 cold: low 0, high 32, batch 16 cpu 3 hot: low 32, high 96, batch 16 cpu 3 cold: low 0, high 32, batch 16 Free pages: 2709004kB (2626176kB HighMem) Active:139314 inactive:159921 dirty:158039 writeback:105 unstable:0 free:677251 slab:26349 mapped:127907 pagetables:18240 DMA free:1004kB min:68kB low:84kB high:100kB active:0kB inactive:10272kB present:16384kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 Normal free:81648kB min:3756kB low:4692kB high:5632kB active:1908kB inactive:610696kB present:901120kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 HighMem free:2626176kB min:512kB low:640kB high:768kB active:555356kB inactive:19060kB present:3407872kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 DMA: 183*4kB 24*8kB 3*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1004kB Normal: 14450*4kB 2009*8kB 303*16kB 82*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 81472kB HighMem: 7676*4kB 4952*8kB 3683*16kB 4271*32kB 3724*64kB 2393*128kB 1091*256kB 344*512kB 86*1024kB 41*2048kB 290*4096kB = 2625856kB Swap cache: add 0, delete 0, find 0/0, race 0+0 Free swap: 0kB 1081344 pages of RAM 819184 pages of HIGHMEM 42660 reserved pages 2375875 pages shared 0 pages swap cached Out of Memory: Killed process 3130 (xxx).
Fedora Core 2 has now reached end of life, and no further updates will be provided by Red Hat. The Fedora legacy project will be producing further kernel updates for security problems only. If this bug has not been fixed in the latest Fedora Core 2 update kernel, please try to reproduce it under Fedora Core 3, and reopen if necessary, changing the product version accordingly. Thank you.