180039 – heavy write with cfq shedulers kills a machine

Bug 180039 - heavy write with cfq shedulers kills a machine

Summary: heavy write with cfq shedulers kills a machine

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Larry Woodman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	180040 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-02-04 23:09 UTC by Jure Pečar
Modified:	2012-06-20 15:59 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-06-20 15:59:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
source code for test program. (3.28 KB, text/plain) 2006-09-05 19:33 UTC, Jeffrey Parker	no flags	Details
View All

Description Jure Pečar 2006-02-04 23:09:11 UTC

From Bugzilla Helper:
User-Agent: Opera/8.5 (X11; Linux i686; U; en)

Description of problem:
Cfq sheduler is the default scheduler in rhel4. I noticed that heavy write 
activity would bring the machine to a halt in a matter of minutes.

I observed this on two different systems now:
1. dual opteron writing to an IBM DS4100 via qlogic 2300
2. dual xeon (ia32) writing to Coraid AoE storage via gigE

Both exhibit the same behaviour. Switching to elevator=deadline fixes the 
problem.



Version-Release number of selected component (if applicable):
kernel 2.6.9-22.0.2-ELsmp

How reproducible:
Always

Steps to Reproduce:
1. have a Tb or so of storage attached to the machine (don't know if it really 
matters, but anyway)
2. run something like for i in `seq 1 1000`; do dd if=/dev/zero of=/somewhere/
somefile.$i bs=1M count=1000; done ... where /somewhere is the mountpoint of 
that storage
3. wait
  

Actual Results:  In 5-15 minutes, the machine becomes totaly unresponsive. What remains in memory 
still runs, everything else is just dead. It's impossible even to login, as it 
just times out.

Expected Results:  Normal writing.

Additional info:

As I mentioned, switching elevator to some other than cfq makes the problem go 
away. I'm not sure that means that cfq is at fault ... it might also be some 
strange interaction of it and default vm settings.

Comment 1 Jason Baron 2006-02-07 16:29:02 UTC

*** Bug 180040 has been marked as a duplicate of this bug. ***

Comment 2 Larry Woodman 2006-02-08 18:21:55 UTC

I have not been able to reproduce this problem locally.  Can you get your system
into this state and get me an AltSysrq-M, AltSysrq-T and AltSysrq-W output so I
can see who is waiting on what, who is ruuning and where and where all the
memory is?

Thanks, Larry Woodman

Comment 3 Jure Pečar 2006-02-23 15:21:45 UTC

I've set up serial console to that dual xeon box and am waiting for the 
deadlock to occur again. Somehow it does not want to appear immediately this 
time.

However, I've got plenty of these:

kswapd0: page allocation failure. order:0, mode:0x50
 [<c0143451>] __alloc_pages+0x2e1/0x2f7
 [<c014347f>] __get_free_pages+0x18/0x24
 [<c0145de8>] kmem_getpages+0x1c/0xbb
 [<c0146936>] cache_grow+0xab/0x138
 [<c0146b28>] cache_alloc_refill+0x165/0x19d
 [<c0146d23>] kmem_cache_alloc+0x51/0x57
 [<f895a951>] journal_alloc_journal_head+0x10/0x5d [jbd]
 [<f895a9c4>] journal_add_journal_head+0x1a/0xe6 [jbd]
 [<f8955019>] journal_dirty_data+0x31/0x1b2 [jbd]
 [<f8915e3e>] ext3_journal_dirty_data+0xc/0x2a [ext3]
 [<f8915cd6>] walk_page_buffers+0x62/0x87 [ext3]
 [<f8916276>] ext3_ordered_writepage+0xee/0x13a [ext3]
 [<f8916176>] journal_dirty_data_fn+0x0/0x12 [ext3]
 [<c01489d9>] pageout+0x8d/0xcc
 [<c0148c1f>] shrink_list+0x207/0x3ed
 [<c01298c8>] del_timer+0x5d/0x65
 [<c0129974>] del_singleshot_timer_sync+0x8/0x21
 [<c02cfb57>] schedule_timeout+0xda/0xee
 [<c0148fe2>] shrink_cache+0x1dd/0x34d
 [<c01496a0>] shrink_zone+0xa7/0xb6
 [<c0149a9b>] balance_pgdat+0x1c5/0x30e
 [<c011fe0c>] prepare_to_wait+0x12/0x4c
 [<c0149cae>] kswapd+0xca/0xcc
 [<c011fee1>] autoremove_wake_function+0x0/0x2d
 [<c02d129e>] ret_from_fork+0x6/0x14
 [<c011fee1>] autoremove_wake_function+0x0/0x2d
 [<c0149be4>] kswapd+0x0/0xcc
 [<c01041f1>] kernel_thread_helper+0x5/0xb
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
cpu 2 hot: low 2, high 6, batch 1
cpu 2 cold: low 0, high 2, batch 1
cpu 3 hot: low 2, high 6, batch 1
cpu 3 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16

Free pages:     1091228kB (1076736kB HighMem)
Active:19739 inactive:682279 dirty:284643 writeback:21354 unstable:0 free:272935
 slab:29055 mapped:19607 pagetables:727
DMA free:12572kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:
16384kB pages_scanned:20774 all_unreclaimable? yes
protections[]: 0 0 0
Normal free:1856kB min:928kB low:1856kB high:2784kB active:260kB inactive:616020
kB present:901120kB pages_scanned:95898 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:1076736kB min:512kB low:1024kB high:1536kB active:78472kB inactive:
2106924kB present:3801088kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 1*4kB 3*8kB 4*16kB 2*32kB 4*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 
2*4096kB = 12572kB
Normal: 0*4kB 0*8kB 40*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*
2048kB 0*4096kB = 640kB
HighMem: 3992*4kB 2248*8kB 1710*16kB 3428*32kB 1076*64kB 622*128kB 558*256kB 46*
512kB 367*1024kB 67*2048kB 19*4096kB = 1076736kB
Swap cache: add 237, delete 231, find 30/91, race 0+0
26308 bounce buffer pages
Free swap:       7290848kB
1179648 pages of RAM
819168 pages of HIGHMEM
141787 reserved pages
696537 pages shared
6 pages swap cached
ENOMEM in journal_alloc_journal_head, retrying.

... and 8 more, probably similiar.

Comment 4 Jeffrey Parker 2006-09-05 14:21:01 UTC

I see this when writing on machines with 4G RAM but not 1G RAM.  I have a test
program using the fopen, fwrite, fflush, fclose sequence and will crash this
program, terminal windows, X servers, etc. as I increase the size of the fwrite
but not the number of repeated fwrites.  The final size of the file has no
effect.  I create 4G files with LARGEFILE support.  The fix mentioned above does
resolve this.

Has this bug been resolved yet?

Comment 5 Jason Baron 2006-09-05 17:53:40 UTC

hmmm, this might very well be a duplicate of bug 184535. Can you please try the
U4 kernel -42 and see that resolves the issue.

Comment 6 Jeffrey Parker 2006-09-05 19:33:04 UTC

Created attachment 135599 [details]
source code for test program.

Comment 7 Jeffrey Parker 2006-09-05 19:41:21 UTC

My above attachment is the test program I use to find the crashes.  I installed
the latest kernel update 4 resulting in 2.6.9-42.0.2 as stated in
RHSA-2006-0575-22.  I also read 184535 and 182577.  I'm still able to crash my
program, terminal windows, X servers, etc.  The final file size doesn't seem to
matter.  I ususally try to create 5-6 files and it crashes on 3 or 4.  The size
of the fwrite seems to crash the programs and it's different on various
machines.  I can't get it to crash on a 1G RAM machine running RHE 4.0 with no
updates.  THe machine I tested 2.6.9-42.0.2 is a Dual Intel Xeon 3.4 GhZ CPU’s
with 4 GB RAM.

Comment 8 Jeffrey Parker 2006-10-30 16:03:35 UTC

Any further resolution on this item?  Any estimate on a time frame for a fix?

Comment 9 Larry Woodman 2006-12-11 19:31:27 UTC

Can you try setting /proc/sys/vm/min_free_kbytes to 4 times the default value
and see if this prevents the page allocation failures you are seeing?  This
should kick in the page reclamation before the system gets so low on memory and
prevent this from happening.

I am not sure that this is the cause of the "heavy write with cfq shedulers
kills a machine" problem though.

Larry Woodman

Comment 11 RHEL Program Management 2007-05-09 10:53:05 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 RHEL Program Management 2007-09-07 19:45:55 UTC

This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 13 Jiri Pallich 2012-06-20 15:59:05 UTC

Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.

Note You need to log in before you can comment on or make changes to this bug.