From Bugzilla Helper: User-Agent: Mozilla/5.0 (compatible; Konqueror/3.1) Description of problem: Access on raw devices located on mirrored 10k SCSI disks attached to a Compaq Smart array controller is _extremely_ slow. E. g. "dd" is about 1000(!) times slower than on files..... We reproduced this on DL580 G1 (4xXeon 700, /dev/ida/...) and DL 380 G2 (2x PIII 1.4 GHz, /dev/cciss/...). Another box using a RAID 0 with a IPC Vortex GDT controller is only about 1.3 times slower on the raw devices for the same tests.... I guess the RAID level would justify a speedup of 2, but not 700 ..... Version-Release number of selected component (if applicable): kernel-2.4.9-e16enterprise How reproducible: Always Steps to Reproduce: 1. Create raw devices on a RAID 1 device of a Smart Array Controller 2. time dd if=/dev/zero of=/dev/raw/<your raw device here> bs=8192 count=2000 3. time dd if=/dev/zero of=/tmp/testfile bs=8192 count=2000 Actual Results: raw devices on the CPQ Array are about 1000 times slower than files Expected Results: raw devices should not be significantly slower than files Additional info: We mentioned this when the DBA mawned about slow Sybase ASE startup; deleting a 100 mb temp-db takes about 5 minutes.... After that, we tried the "dd" on a production DB server and learned it acts just the same... When I imagine how we could blow all these 420R's away if the Compaqs were once more faster ;-)
bs=8192 this is no surprise that it's slower. with raw devices you tell teh kernel to do ZERO optimisations whatsoever. so what happens is that your disk gets a worst case IO pattern, and that's very very slow. Please use a MUCH bigger blocksize for testing stuff like this.
So how can it be that the ICP with the same amount of RAM takes only ~ 1.3 times the time compared to dd'ing to a file on a -o sync mount...? We took this blocksize 'cause the DB will also use it. BTW: Why is sequentially writing to a hdd a worst case IO pattern? Or does the Compaq driver know better how to match a linux fs on the virtual disks? Just wondering...
first of all the worst case raw io pattern is because of the following: dd sends the 8kb request to the kernel. the kernel io subsystem sends it right to the IO controller. When it gets to the disk the disk has to wait until the head is over the right position on the track and then write the 8Kb. (this delay is called rotational latency sometimes). then the disk notifies the controller about being done and the controller then notifies the kernel, which then returns to dd. dd then submits the next 8Kb, when it gets to the disk the disk has rotated just enough for the disk to have to wait almost an entire rotation before the right place to come up, due to the delay between the IO submits. A full rotation can, depending on the disk, easily take 5 ms. It could well be that the ICP controller will buffer the IO in it's ram and give the kernel the "complete" signal right away, while the CPQARRAY might be configured to go right to the disk for writes. (I'm guessing settings here, most raid cards have a bios where you can configure write back caching vs write through caching). Databases suffer less from this problem because they will generally have several dozen IO's in flight, instead of the strict linear "dd" behavior which causes the worst case delay.
<long stretched>o k</long stretched> I see that dd issue, and I guess we'll do some more tests with bigger block sizes. The controller's "BIOS" (that Windows stuff on the Smartstart CD) is set to cache 50% read/50% write. But besides that, do you agree that the cleanup of a 100 MB raw device should not take 5 minutes? I mean, this should be a whole bunch of I/O operations in a whole, right? Sybase opens its files o_sync, and on files, the same operation takes only a couple of seconds (around 20-30, as far as I remember). I guess I will open a call at HP in parallel and point the guys from there on this thread here, if you don't disagree.
RHEL2.1 is currently accepting only critical security fixes. This issue is outside the current scope of support.