From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Description of problem: I created a filesystem using ext2 (or create ext2 and add journal using tune2fs -j) with a blocksize of 1024. I then run mysql (self-compiled latest 4.0 stable release) with its datafiles (about 150GB) on the filesystem. When I start extensive queries, I see the system block cache (as reported via vmstat) fill slowly and available memory decrease. Once I reach a threshold (I have 8GB, at approximately 7.5GB cache use) the system decides it is starved for memory, and begins to shut down processes. Once I unmount the FS, the system cache is reclaimed. I do not have this problem w/ 2048 or 4096 blocksizes. Version-Release number of selected component (if applicable): 2.4.21-27.ELsmp How reproducible: Always Steps to Reproduce: 1. create 1024 blocksize filesystem 2. start mysql w/ datafiles (all MyISAM) on filesystem 3. extensively query DB. Actual Results: Mysql is forcible restarted. The kernel begins to shutdown processes because it is memory starved. /var/log/messages snippet: Jan 28 14:32:06 cache01 kernel: Mem-info: Jan 28 14:32:06 cache01 kernel: Zone:DMA freepages: 2465 min: 0 low: 0 high: 0 Jan 28 14:32:06 cache01 kernel: Zone:Normal freepages: 1224 min: 1279 low: 4544 high: 6304 Jan 28 14:32:06 cache01 kernel: Zone:HighMem freepages:107399 min: 255 low: 32256 high: 48384 Jan 28 14:32:06 cache01 kernel: Free pages: 111088 (107399 HighMem) Jan 28 14:32:06 cache01 kernel: ( Active: 1336048/328136, inactive_laundry: 49272, inactive_clean: 49 152, free: 111089 ) Jan 28 14:32:06 cache01 kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:2465 Jan 28 14:32:06 cache01 kernel: aa:0 ac:2836 id:421 il:202 ic:0 fr:1225 Jan 28 14:32:06 cache01 kernel: aa:2867 ac:1330345 id:327633 il:49152 ic:49152 fr:107399 Jan 28 14:32:06 cache01 kernel: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2 048kB 2*4096kB = 9860kB) Jan 28 14:32:06 cache01 kernel: 202*4kB 69*8kB 11*16kB 3*32kB 1*64kB 5*128kB 2*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 4896kB) Jan 28 14:32:06 cache01 kernel: 61483*4kB 2468*8kB 33*16kB 0*32kB 895*64kB 813*128kB 6*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 429596kB) Jan 28 14:32:06 cache01 kernel: Swap cache: add 45453, delete 45453, find 79238/99294, race 0+0 Jan 28 14:32:06 cache01 kernel: 171985 pages of slabcache Jan 28 14:32:06 cache01 kernel: 170 pages of kernel stacks Jan 28 14:32:06 cache01 kernel: 0 lowmem pagetables, 152 highmem pagetables Jan 28 14:32:06 cache01 kernel: Free swap: 2096440kB Jan 28 14:32:06 cache01 kernel: 2293760 pages of RAM Jan 28 14:32:06 cache01 kernel: 1867632 pages of HIGHMEM Jan 28 14:32:06 cache01 kernel: 243951 reserved pages Jan 28 14:32:06 cache01 kernel: 1487998 pages shared Jan 28 14:32:06 cache01 kernel: 0 pages swap cached Jan 28 14:32:06 cache01 kernel: Out of Memory: Killed process 12756 (mysqld). ................last message repeated for each process........ Jan 28 14:32:06 cache01 kernel: Fixed up OOM kill of mm-less task Expected Results: System should behave normally. No processes should be killed. Additional info: Filesystem has only been tested across Emulex FC HBA backend into Hitachi 9585.
I confirmed this behavior also occurs on local SCSI disks using megaraid controller.
John, can you grab me a quick /proc/slabinfo output when this happens? You have ~172K pages in the slabcache and I need to see exactly where they are. Thanks, Larry Woodman
Also, please run the latest pre-RHEL3-U5 kernel located here: >>>http://people.redhat.com/~lwoodman/RHEL3/ It includes a bugfix that was preventing bufferheaders from being reclaimed from highmem when only lowmem was exhausted and that could very well be the problem you are hitting here. Larry
I have not had the time nor the resources to pursue this matter. Using 4096 blocksize seems to work. I aborted my use of RHEL here because it would lock up w/in the first 10 minutes of using CTCS to burnin. Mandrake 10.1 was used instead. On my next iteration of this project, I will try to use the freshly released RHEL4. Please close/archive/?? this bug appropriately.
John, we believe this problem was already fixed in the first RHEL3 U5 build, which was on 15-Nov-2004 (for kernel version 2.4.21-25.1.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-294.html