89579 – RAW devices extremely slow on CPQ Array controller

Bug 89579 - RAW devices extremely slow on CPQ Array controller

Summary: RAW devices extremely slow on CPQ Array controller

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	2.1
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jim Paradis
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-04-24 15:19 UTC by Nick (Gunnar) Bluth
Modified:	2013-08-06 01:01 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-06-08 21:36:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Nick (Gunnar) Bluth 2003-04-24 15:19:07 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.1)

Description of problem:
Access on raw devices located on mirrored 10k SCSI disks attached to a Compaq Smart array controller is _extremely_ slow.
E. g. "dd" is about 1000(!) times slower than on files.....

We reproduced this on DL580 G1 (4xXeon 700, /dev/ida/...) and DL 380 G2 (2x PIII 1.4 GHz, /dev/cciss/...).

Another box using a RAID 0 with a IPC Vortex GDT controller is only about 1.3 times slower on the raw devices for the same tests....

I guess the RAID level would justify a speedup of 2, but not 700 .....


Version-Release number of selected component (if applicable):
kernel-2.4.9-e16enterprise

How reproducible:
Always

Steps to Reproduce:
1. Create raw devices on a RAID 1 device of a Smart Array Controller
2. time dd if=/dev/zero of=/dev/raw/<your raw device here> bs=8192 count=2000
3. time dd if=/dev/zero of=/tmp/testfile bs=8192 count=2000
    

Actual Results:  raw devices on the CPQ Array are about 1000 times slower than files 

Expected Results:  raw devices should not be significantly slower than files

Additional info:

We mentioned this when the DBA mawned about slow Sybase ASE startup; deleting a 100 mb temp-db takes about 5 minutes....
After that, we tried the "dd" on a production DB server and learned it acts just the same...
When I imagine how we could blow all these 420R's away if the Compaqs were once more faster ;-)

Comment 1 Arjan van de Ven 2003-04-24 15:23:52 UTC

bs=8192

this is no surprise that it's slower.
with raw devices you tell teh kernel to do ZERO optimisations whatsoever.
so what happens is that your disk gets a worst case IO pattern, and that's very
very slow.
Please use a MUCH bigger blocksize for testing stuff like this.

Comment 2 Nick (Gunnar) Bluth 2003-04-24 16:32:01 UTC

So how can it be that the ICP with the same amount of RAM takes only ~ 1.3 
times the time compared to dd'ing to a file on a -o sync mount...?

We took this blocksize 'cause the DB will also use it.

BTW: Why is sequentially writing to a hdd a worst case IO pattern? 
Or does the Compaq driver know better how to match a linux fs on the virtual 
disks? Just wondering...

Comment 3 Arjan van de Ven 2003-04-24 16:37:56 UTC

first of all the worst case raw io pattern is because of the following:
dd sends the 8kb request to the kernel. the kernel io subsystem sends it right
to the IO controller. When it gets to the disk the disk has to wait until the
head is  over the right position on the track and then write the 8Kb. (this
delay is called rotational latency sometimes). then the disk notifies the
controller about being done and the controller then notifies the kernel, which
then returns to dd. dd then submits the next 8Kb, when it gets to the disk the
disk has rotated just enough for the disk to have to wait almost an entire
rotation before the right place to come up, due to the delay between the IO
submits. A full rotation can, depending on the disk, easily take 5 ms.

It could well be that the ICP controller will buffer the IO in it's ram and give
the kernel the "complete" signal right away, while the CPQARRAY might be
configured to go right to the disk for writes. (I'm guessing settings here, most
raid cards have a bios where you can configure write back caching vs write
through caching).

Databases suffer less from this problem because they will generally have several
dozen IO's in flight, instead of the strict linear "dd" behavior which causes
the worst case delay.

Comment 4 Nick (Gunnar) Bluth 2003-04-24 17:31:09 UTC

<long stretched>o k</long stretched>
I see that dd issue, and I guess we'll do some more tests with bigger block 
sizes.

The controller's "BIOS" (that Windows stuff on the Smartstart CD) is set to 
cache 50% read/50% write.

But besides that, do you agree that the cleanup of a 100 MB raw device should 
not take 5 minutes? I mean, this should be a whole bunch of I/O operations in a 
whole, right?
Sybase opens its files o_sync, and on files, the same operation takes only a 
couple of seconds (around 20-30, as far as I remember).

I guess I will open a call at HP in parallel and point the guys from there on 
this thread here, if you don't disagree.

Comment 5 Jim Paradis 2006-06-08 21:36:11 UTC

RHEL2.1 is currently accepting only critical security fixes.  This issue is
outside the current scope of support.

Note You need to log in before you can comment on or make changes to this bug.