500145 – Very slow disk I/O on Xen PV guests compared to Dom0 with hardware RAID5

Bug 500145 - Very slow disk I/O on Xen PV guests compared to Dom0 with hardware RAID5

Summary: Very slow disk I/O on Xen PV guests compared to Dom0 with hardware RAID5

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	xen
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Rik van Riel
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	477162
TreeView+	depends on / blocked

Reported:	2009-05-11 12:06 UTC by Veaceslav Falico
Modified:	2018-10-20 02:05 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-06-09 13:31:25 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Veaceslav Falico 2009-05-11 12:06:32 UTC

While using RAID5 in the hardware level, IO performance from a PV guest is 2x slower than doing that from the Dom0.  A simple "dd" test shows this difference. This problem is not found while using RAID1/RAID0 in the hardware level. While using RAID1/RAID0, the IO performance of guests are very close to what it's in Dom0. Below is the figures from my test system. The figures are from an HP DL385 G2 system. The same system is used to do the tests with RAID5, RAID1/RAID0.

The dom0 and guest is identically configured, equal memory - 1GB, equal vcpus - 4vcpus and no services/apps are running on either. Both systems are completely idle while running the dd test.

*With raid1 in the hardware level:* (Compare the speed that dd shows, time taken to complete dd and time taken to flush the data to disk)

Guest: # sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date
Mon Apr 27 03:41:57 IST 2009
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 2.29844 seconds, 228 MB/s

real 0m2.304s
user 0m0.000s
sys 0m1.224s
Mon Apr 27 03:41:59 IST 2009

real 0m8.999s
user 0m0.000s
sys 0m0.028s
Mon Apr 27 03:42:08 IST 2009

Host:

# sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date
Mon Apr 27 03:42:19 IST 2009
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 2.07112 seconds, 253 MB/s

real 0m2.075s
user 0m0.000s
sys 0m1.264s
Mon Apr 27 03:42:21 IST 2009

real 0m8.216s
user 0m0.000s
sys 0m0.032s
Mon Apr 27 03:42:30 IST 2009

*With raid0 in the hardware level* (Compare the speed that dd shows, time taken to complete dd and time taken to flush the data to disk)

Guest:

# sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date
Mon Apr 27 04:58:21 IST 2009
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 1.49519 seconds, 351 MB/s

real 0m1.500s
user 0m0.024s
sys 0m1.292s
Mon Apr 27 04:58:22 IST 2009

real 0m4.982s
user 0m0.000s
sys 0m0.048s
Mon Apr 27 04:58:27 IST 2009

Host:

# sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date
Mon Apr 27 04:58:43 IST 2009
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 1.69546 seconds, 309 MB/s

real 0m1.700s
user 0m0.000s
sys 0m1.308s
Mon Apr 27 04:58:45 IST 2009

real 0m3.500s
user 0m0.000s
sys 0m0.076s
Mon Apr 27 04:58:48 IST 2009

*With RAID5 in the hardware level* (We have tested this with different schedulers and the best combinations is deadline in Host and CFQ in guest) - Please note the default scheduler in guest is noop.

Host CFQ:

# sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date
Mon Apr 27 23:57:56 IST 2009
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 3.42753 seconds, 153 MB/s

real 0m3.683s
user 0m0.000s
sys 0m1.272s
Mon Apr 27 23:58:00 IST 2009

real 0m11.071s
user 0m0.004s
sys 0m0.064s
Mon Apr 27 23:58:12 IST 2009

Guest (noop):

# sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date
Mon Apr 27 23:56:23 IST 2009
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 9.55193 seconds, 54.9

real 0m9.557s   -------------------------------------- 3 times slower to Host
user 0m0.000s
sys 0m1.148s
Mon Apr 27 23:56:33 IST 2009

real 0m42.348s  --------------------------------------4 times slower to Host
user 0m0.000s
sys 0m0.004s
Mon Apr 27 23:57:15 IST 2009

Guest (CFQ):

# sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date
Mon Apr 27 23:59:14 IST 2009
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 6.00281 seconds, 87.3 MB/s

real 0m6.007s  ----------------------------------- 2 times slower to host
user 0m0.000s
sys 0m1.208s
Mon Apr 27 23:59:20 IST 2009

real 0m28.414s    -----------------------------  3 times slower to host   
user 0m0.000s
sys 0m0.004s
Mon Apr 27 23:59:48 IST 2009

Host (deadline):

# sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date
Tue Apr 28 00:00:55 IST 2009
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 3.29603 seconds, 159 MB/s

real 0m3.300s
user 0m0.008s
sys 0m1.260s
Tue Apr 28 00:00:58 IST 2009

real 0m13.092s
user 0m0.000s
sys 0m0.068s
Tue Apr 28 00:01:12 IST 2009

Guest (noop):

# sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date
Tue Apr 28 00:03:18 IST 2009
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 10.2711 seconds, 51.0 MB/s

real 0m10.395s  ------------------------------------- 3 times slower to Host
user 0m0.004s
sys 0m1.288s
Tue Apr 28 00:03:29 IST 2009

real 0m43.910s  -------------------------------------- 4 times slower to host
user 0m0.000s
sys 0m0.000s
Tue Apr 28 00:04:13 IST 2009

Guest (cfq):

# sync ; date ; time dd if=/dev/zero of=/tmp/test bs=1M count=500 ; date ; time sync ; date
Tue Apr 28 00:02:09 IST 2009
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 6.44671 seconds, 81.3 MB/s

real 0m6.451s --------------------------------------- 2 times slower to host
user 0m0.000s
sys 0m1.204s
Tue Apr 28 00:02:15 IST 2009

real 0m28.451s  -------------------------------------- 3 times slower to host
user 0m0.000s
sys 0m0.000s
Tue Apr 28 00:02:44 IST 2009

Comment 1 Rik van Riel 2009-06-02 13:56:35 UTC

What is the stripe size of the RAID 5 array?

The reason I am asking is that a PV guest can only have a certain amount of IO outstanding at any time.  If the stripe size exceeds that amount, a PV guest will never be able to do a whole RAID 5 stripe at once, which will really slow things down.

This is not an issue with RAID 0 or 1, because IOs smaller than a stripe can still be done efficiently (because no parity calculation needs to be done).

Comment 2 Sadique Puthen 2009-06-05 06:50:05 UTC

Rik,

HP ACU (Array Configuration Utility) shows the stripe size is 64K for the system where I reproduced this issue. There is no way to change it, as far as I looked.

Comment 3 Rik van Riel 2009-06-05 13:55:38 UTC

Sadique,

could you check (with iostat -x 3) the IO size in dom0?

If it is less than the stripe size, we may have found our culprit...

Comment 5 Rik van Riel 2009-06-05 14:18:20 UTC

While the iostat you attached does show that IO is almost always smaller than the stripe size of 64K, it would be useful to confirm that this is also the case during the problem workload.

Comment 11 Rik van Riel 2009-06-09 13:31:25 UTC

Please open a new bug with the ACTUAL PROBLEM THE CUSTOMER IS HAVING.

This RAID 5 thing is just a big distraction, because it shows a very different bug.

Note You need to log in before you can comment on or make changes to this bug.