From Bugzilla Helper: User-Agent: Opera/8.5 (X11; Linux i686; U; en) Description of problem: Cfq sheduler is the default scheduler in rhel4. I noticed that heavy write activity would bring the machine to a halt in a matter of minutes. I observed this on two different systems now: 1. dual opteron writing to an IBM DS4100 via qlogic 2300 2. dual xeon (ia32) writing to Coraid AoE storage via gigE Both exhibit the same behaviour. Switching to elevator=deadline fixes the problem. Version-Release number of selected component (if applicable): kernel 2.6.9-22.0.2-ELsmp How reproducible: Always Steps to Reproduce: 1. have a Tb or so of storage attached to the machine (don't know if it really matters, but anyway) 2. run something like for i in `seq 1 1000`; do dd if=/dev/zero of=/somewhere/ somefile.$i bs=1M count=1000; done ... where /somewhere is the mountpoint of that storage 3. wait Actual Results: In 5-15 minutes, the machine becomes totaly unresponsive. What remains in memory still runs, everything else is just dead. It's impossible even to login, as it just times out. Expected Results: Normal writing. Additional info: As I mentioned, switching elevator to some other than cfq makes the problem go away. I'm not sure that means that cfq is at fault ... it might also be some strange interaction of it and default vm settings.
*** Bug 180040 has been marked as a duplicate of this bug. ***
I have not been able to reproduce this problem locally. Can you get your system into this state and get me an AltSysrq-M, AltSysrq-T and AltSysrq-W output so I can see who is waiting on what, who is ruuning and where and where all the memory is? Thanks, Larry Woodman
I've set up serial console to that dual xeon box and am waiting for the deadlock to occur again. Somehow it does not want to appear immediately this time. However, I've got plenty of these: kswapd0: page allocation failure. order:0, mode:0x50 [<c0143451>] __alloc_pages+0x2e1/0x2f7 [<c014347f>] __get_free_pages+0x18/0x24 [<c0145de8>] kmem_getpages+0x1c/0xbb [<c0146936>] cache_grow+0xab/0x138 [<c0146b28>] cache_alloc_refill+0x165/0x19d [<c0146d23>] kmem_cache_alloc+0x51/0x57 [<f895a951>] journal_alloc_journal_head+0x10/0x5d [jbd] [<f895a9c4>] journal_add_journal_head+0x1a/0xe6 [jbd] [<f8955019>] journal_dirty_data+0x31/0x1b2 [jbd] [<f8915e3e>] ext3_journal_dirty_data+0xc/0x2a [ext3] [<f8915cd6>] walk_page_buffers+0x62/0x87 [ext3] [<f8916276>] ext3_ordered_writepage+0xee/0x13a [ext3] [<f8916176>] journal_dirty_data_fn+0x0/0x12 [ext3] [<c01489d9>] pageout+0x8d/0xcc [<c0148c1f>] shrink_list+0x207/0x3ed [<c01298c8>] del_timer+0x5d/0x65 [<c0129974>] del_singleshot_timer_sync+0x8/0x21 [<c02cfb57>] schedule_timeout+0xda/0xee [<c0148fe2>] shrink_cache+0x1dd/0x34d [<c01496a0>] shrink_zone+0xa7/0xb6 [<c0149a9b>] balance_pgdat+0x1c5/0x30e [<c011fe0c>] prepare_to_wait+0x12/0x4c [<c0149cae>] kswapd+0xca/0xcc [<c011fee1>] autoremove_wake_function+0x0/0x2d [<c02d129e>] ret_from_fork+0x6/0x14 [<c011fee1>] autoremove_wake_function+0x0/0x2d [<c0149be4>] kswapd+0x0/0xcc [<c01041f1>] kernel_thread_helper+0x5/0xb Mem-info: DMA per-cpu: cpu 0 hot: low 2, high 6, batch 1 cpu 0 cold: low 0, high 2, batch 1 cpu 1 hot: low 2, high 6, batch 1 cpu 1 cold: low 0, high 2, batch 1 cpu 2 hot: low 2, high 6, batch 1 cpu 2 cold: low 0, high 2, batch 1 cpu 3 hot: low 2, high 6, batch 1 cpu 3 cold: low 0, high 2, batch 1 Normal per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 cpu 1 hot: low 32, high 96, batch 16 cpu 1 cold: low 0, high 32, batch 16 cpu 2 hot: low 32, high 96, batch 16 cpu 2 cold: low 0, high 32, batch 16 cpu 3 hot: low 32, high 96, batch 16 cpu 3 cold: low 0, high 32, batch 16 HighMem per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 cpu 1 hot: low 32, high 96, batch 16 cpu 1 cold: low 0, high 32, batch 16 cpu 2 hot: low 32, high 96, batch 16 cpu 2 cold: low 0, high 32, batch 16 cpu 3 hot: low 32, high 96, batch 16 cpu 3 cold: low 0, high 32, batch 16 Free pages: 1091228kB (1076736kB HighMem) Active:19739 inactive:682279 dirty:284643 writeback:21354 unstable:0 free:272935 slab:29055 mapped:19607 pagetables:727 DMA free:12572kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present: 16384kB pages_scanned:20774 all_unreclaimable? yes protections[]: 0 0 0 Normal free:1856kB min:928kB low:1856kB high:2784kB active:260kB inactive:616020 kB present:901120kB pages_scanned:95898 all_unreclaimable? no protections[]: 0 0 0 HighMem free:1076736kB min:512kB low:1024kB high:1536kB active:78472kB inactive: 2106924kB present:3801088kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 DMA: 1*4kB 3*8kB 4*16kB 2*32kB 4*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12572kB Normal: 0*4kB 0*8kB 40*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0* 2048kB 0*4096kB = 640kB HighMem: 3992*4kB 2248*8kB 1710*16kB 3428*32kB 1076*64kB 622*128kB 558*256kB 46* 512kB 367*1024kB 67*2048kB 19*4096kB = 1076736kB Swap cache: add 237, delete 231, find 30/91, race 0+0 26308 bounce buffer pages Free swap: 7290848kB 1179648 pages of RAM 819168 pages of HIGHMEM 141787 reserved pages 696537 pages shared 6 pages swap cached ENOMEM in journal_alloc_journal_head, retrying. ... and 8 more, probably similiar.
I see this when writing on machines with 4G RAM but not 1G RAM. I have a test program using the fopen, fwrite, fflush, fclose sequence and will crash this program, terminal windows, X servers, etc. as I increase the size of the fwrite but not the number of repeated fwrites. The final size of the file has no effect. I create 4G files with LARGEFILE support. The fix mentioned above does resolve this. Has this bug been resolved yet?
hmmm, this might very well be a duplicate of bug 184535. Can you please try the U4 kernel -42 and see that resolves the issue.
Created attachment 135599 [details] source code for test program.
My above attachment is the test program I use to find the crashes. I installed the latest kernel update 4 resulting in 2.6.9-42.0.2 as stated in RHSA-2006-0575-22. I also read 184535 and 182577. I'm still able to crash my program, terminal windows, X servers, etc. The final file size doesn't seem to matter. I ususally try to create 5-6 files and it crashes on 3 or 4. The size of the fwrite seems to crash the programs and it's different on various machines. I can't get it to crash on a 1G RAM machine running RHE 4.0 with no updates. THe machine I tested 2.6.9-42.0.2 is a Dual Intel Xeon 3.4 GhZ CPU’s with 4 GB RAM.
Any further resolution on this item? Any estimate on a time frame for a fix?
Can you try setting /proc/sys/vm/min_free_kbytes to 4 times the default value and see if this prevents the page allocation failures you are seeing? This should kick in the page reclamation before the system gets so low on memory and prevent this from happening. I am not sure that this is the cause of the "heavy write with cfq shedulers kills a machine" problem though. Larry Woodman
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.