Red Hat Bugzilla – Bug 89579
RAW devices extremely slow on CPQ Array controller
Last modified: 2013-08-05 21:01:09 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.1)
Description of problem:
Access on raw devices located on mirrored 10k SCSI disks attached to a Compaq Smart array controller is _extremely_ slow.
E. g. "dd" is about 1000(!) times slower than on files.....
We reproduced this on DL580 G1 (4xXeon 700, /dev/ida/...) and DL 380 G2 (2x PIII 1.4 GHz, /dev/cciss/...).
Another box using a RAID 0 with a IPC Vortex GDT controller is only about 1.3 times slower on the raw devices for the same tests....
I guess the RAID level would justify a speedup of 2, but not 700 .....
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create raw devices on a RAID 1 device of a Smart Array Controller
2. time dd if=/dev/zero of=/dev/raw/<your raw device here> bs=8192 count=2000
3. time dd if=/dev/zero of=/tmp/testfile bs=8192 count=2000
Actual Results: raw devices on the CPQ Array are about 1000 times slower than files
Expected Results: raw devices should not be significantly slower than files
We mentioned this when the DBA mawned about slow Sybase ASE startup; deleting a 100 mb temp-db takes about 5 minutes....
After that, we tried the "dd" on a production DB server and learned it acts just the same...
When I imagine how we could blow all these 420R's away if the Compaqs were once more faster ;-)
this is no surprise that it's slower.
with raw devices you tell teh kernel to do ZERO optimisations whatsoever.
so what happens is that your disk gets a worst case IO pattern, and that's very
Please use a MUCH bigger blocksize for testing stuff like this.
So how can it be that the ICP with the same amount of RAM takes only ~ 1.3
times the time compared to dd'ing to a file on a -o sync mount...?
We took this blocksize 'cause the DB will also use it.
BTW: Why is sequentially writing to a hdd a worst case IO pattern?
Or does the Compaq driver know better how to match a linux fs on the virtual
disks? Just wondering...
first of all the worst case raw io pattern is because of the following:
dd sends the 8kb request to the kernel. the kernel io subsystem sends it right
to the IO controller. When it gets to the disk the disk has to wait until the
head is over the right position on the track and then write the 8Kb. (this
delay is called rotational latency sometimes). then the disk notifies the
controller about being done and the controller then notifies the kernel, which
then returns to dd. dd then submits the next 8Kb, when it gets to the disk the
disk has rotated just enough for the disk to have to wait almost an entire
rotation before the right place to come up, due to the delay between the IO
submits. A full rotation can, depending on the disk, easily take 5 ms.
It could well be that the ICP controller will buffer the IO in it's ram and give
the kernel the "complete" signal right away, while the CPQARRAY might be
configured to go right to the disk for writes. (I'm guessing settings here, most
raid cards have a bios where you can configure write back caching vs write
Databases suffer less from this problem because they will generally have several
dozen IO's in flight, instead of the strict linear "dd" behavior which causes
the worst case delay.
<long stretched>o k</long stretched>
I see that dd issue, and I guess we'll do some more tests with bigger block
The controller's "BIOS" (that Windows stuff on the Smartstart CD) is set to
cache 50% read/50% write.
But besides that, do you agree that the cleanup of a 100 MB raw device should
not take 5 minutes? I mean, this should be a whole bunch of I/O operations in a
Sybase opens its files o_sync, and on files, the same operation takes only a
couple of seconds (around 20-30, as far as I remember).
I guess I will open a call at HP in parallel and point the guys from there on
this thread here, if you don't disagree.
RHEL2.1 is currently accepting only critical security fixes. This issue is
outside the current scope of support.