While using RAID5 in the hardware level, IO performance from a PV guest is 2x slower than doing that from the Dom0. A simple "dd" test shows this difference. This problem is not found while using RAID1/RAID0 in the hardware level. While using RAID1/RAID0, the IO performance of guests are very close to what it's in Dom0. Below is the figures from my test system. The figures are from an HP DL385 G2 system. The same system is used to do the tests with RAID5, RAID1/RAID0. The dom0 and guest is identically configured, equal memory - 1GB, equal vcpus - 4vcpus and no services/apps are running on either. Both systems are completely idle while running the dd test. *With raid1 in the hardware level:* (Compare the speed that dd shows, time taken to complete dd and time taken to flush the data to disk) Guest: # sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date Mon Apr 27 03:41:57 IST 2009 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 2.29844 seconds, 228 MB/s real 0m2.304s user 0m0.000s sys 0m1.224s Mon Apr 27 03:41:59 IST 2009 real 0m8.999s user 0m0.000s sys 0m0.028s Mon Apr 27 03:42:08 IST 2009 Host: # sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date Mon Apr 27 03:42:19 IST 2009 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 2.07112 seconds, 253 MB/s real 0m2.075s user 0m0.000s sys 0m1.264s Mon Apr 27 03:42:21 IST 2009 real 0m8.216s user 0m0.000s sys 0m0.032s Mon Apr 27 03:42:30 IST 2009 *With raid0 in the hardware level* (Compare the speed that dd shows, time taken to complete dd and time taken to flush the data to disk) Guest: # sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date Mon Apr 27 04:58:21 IST 2009 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 1.49519 seconds, 351 MB/s real 0m1.500s user 0m0.024s sys 0m1.292s Mon Apr 27 04:58:22 IST 2009 real 0m4.982s user 0m0.000s sys 0m0.048s Mon Apr 27 04:58:27 IST 2009 Host: # sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date Mon Apr 27 04:58:43 IST 2009 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 1.69546 seconds, 309 MB/s real 0m1.700s user 0m0.000s sys 0m1.308s Mon Apr 27 04:58:45 IST 2009 real 0m3.500s user 0m0.000s sys 0m0.076s Mon Apr 27 04:58:48 IST 2009 *With RAID5 in the hardware level* (We have tested this with different schedulers and the best combinations is deadline in Host and CFQ in guest) - Please note the default scheduler in guest is noop. Host CFQ: # sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date Mon Apr 27 23:57:56 IST 2009 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 3.42753 seconds, 153 MB/s real 0m3.683s user 0m0.000s sys 0m1.272s Mon Apr 27 23:58:00 IST 2009 real 0m11.071s user 0m0.004s sys 0m0.064s Mon Apr 27 23:58:12 IST 2009 Guest (noop): # sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date Mon Apr 27 23:56:23 IST 2009 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 9.55193 seconds, 54.9 real 0m9.557s -------------------------------------- 3 times slower to Host user 0m0.000s sys 0m1.148s Mon Apr 27 23:56:33 IST 2009 real 0m42.348s --------------------------------------4 times slower to Host user 0m0.000s sys 0m0.004s Mon Apr 27 23:57:15 IST 2009 Guest (CFQ): # sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date Mon Apr 27 23:59:14 IST 2009 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 6.00281 seconds, 87.3 MB/s real 0m6.007s ----------------------------------- 2 times slower to host user 0m0.000s sys 0m1.208s Mon Apr 27 23:59:20 IST 2009 real 0m28.414s ----------------------------- 3 times slower to host user 0m0.000s sys 0m0.004s Mon Apr 27 23:59:48 IST 2009 Host (deadline): # sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date Tue Apr 28 00:00:55 IST 2009 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 3.29603 seconds, 159 MB/s real 0m3.300s user 0m0.008s sys 0m1.260s Tue Apr 28 00:00:58 IST 2009 real 0m13.092s user 0m0.000s sys 0m0.068s Tue Apr 28 00:01:12 IST 2009 Guest (noop): # sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date Tue Apr 28 00:03:18 IST 2009 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 10.2711 seconds, 51.0 MB/s real 0m10.395s ------------------------------------- 3 times slower to Host user 0m0.004s sys 0m1.288s Tue Apr 28 00:03:29 IST 2009 real 0m43.910s -------------------------------------- 4 times slower to host user 0m0.000s sys 0m0.000s Tue Apr 28 00:04:13 IST 2009 Guest (cfq): # sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date Tue Apr 28 00:02:09 IST 2009 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 6.44671 seconds, 81.3 MB/s real 0m6.451s --------------------------------------- 2 times slower to host user 0m0.000s sys 0m1.204s Tue Apr 28 00:02:15 IST 2009 real 0m28.451s -------------------------------------- 3 times slower to host user 0m0.000s sys 0m0.000s Tue Apr 28 00:02:44 IST 2009
What is the stripe size of the RAID 5 array? The reason I am asking is that a PV guest can only have a certain amount of IO outstanding at any time. If the stripe size exceeds that amount, a PV guest will never be able to do a whole RAID 5 stripe at once, which will really slow things down. This is not an issue with RAID 0 or 1, because IOs smaller than a stripe can still be done efficiently (because no parity calculation needs to be done).
Rik, HP ACU (Array Configuration Utility) shows the stripe size is 64K for the system where I reproduced this issue. There is no way to change it, as far as I looked.
Sadique, could you check (with iostat -x 3) the IO size in dom0? If it is less than the stripe size, we may have found our culprit...
While the iostat you attached does show that IO is almost always smaller than the stripe size of 64K, it would be useful to confirm that this is also the case during the problem workload.
Please open a new bug with the ACTUAL PROBLEM THE CUSTOMER IS HAVING. This RAID 5 thing is just a big distraction, because it shows a very different bug.