180040 – heavy write with cfq shedulers kills a machine

Bug 180040 - heavy write with cfq shedulers kills a machine

Summary: heavy write with cfq shedulers kills a machine

Keywords:
Status:	CLOSED DUPLICATE of bug 180039
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Kernel Maintainer List
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-02-04 23:09 UTC by Jure Pečar
Modified:	2007-11-30 22:07 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-02-07 16:28:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Jure Pečar 2006-02-04 23:09:29 UTC

From Bugzilla Helper:
User-Agent: Opera/8.5 (X11; Linux i686; U; en)

Description of problem:
Cfq sheduler is the default scheduler in rhel4. I noticed that heavy write 
activity would bring the machine to a halt in a matter of minutes.

I observed this on two different systems now:
1. dual opteron writing to an IBM DS4100 via qlogic 2300
2. dual xeon (ia32) writing to Coraid AoE storage via gigE

Both exhibit the same behaviour. Switching to elevator=deadline fixes the 
problem.



Version-Release number of selected component (if applicable):
kernel 2.6.9-22.0.2-ELsmp

How reproducible:
Always

Steps to Reproduce:
1. have a Tb or so of storage attached to the machine (don't know if it really 
matters, but anyway)
2. run something like for i in `seq 1 1000`; do dd if=/dev/zero of=/somewhere/
somefile.$i bs=1M count=1000; done ... where /somewhere is the mountpoint of 
that storage
3. wait
  

Actual Results:  In 5-15 minutes, the machine becomes totaly unresponsive. What remains in memory 
still runs, everything else is just dead. It's impossible even to login, as it 
just times out.

Expected Results:  Normal writing.

Additional info:

As I mentioned, switching elevator to some other than cfq makes the problem go 
away. I'm not sure that means that cfq is at fault ... it might also be some 
strange interaction of it and default vm settings.

Comment 1 Jason Baron 2006-02-07 16:28:48 UTC


*** This bug has been marked as a duplicate of 180039 ***

Note You need to log in before you can comment on or make changes to this bug.