1265672 – [SCALE] Disk performance is really slow

Bug 1265672 - [SCALE] Disk performance is really slow

Summary: [SCALE] Disk performance is really slow

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	ovirt-3.6.3
Target Release:	3.6.0
Assignee:	Nobody
QA Contact:
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-23 13:05 UTC by Carlos Mestre González
Modified:	2016-03-10 07:33 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-09-24 16:54:51 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
qemu log.. (5.19 KB, text/plain) 2015-09-23 13:06 UTC, Carlos Mestre González	no flags	Details
View All

Description Carlos Mestre González 2015-09-23 13:05:26 UTC

Description of problem:
Seems IDE disks are really slow (either RAW or COW/NFS or ISCSI). I've tried to add a secondary disks to a vm, and creating an 1GB ext4 fs on the disks takes like 3 minutes to complete (taking a few secs for virtio or virtio-scsi), and dd speed is 861 kB/s (other virtio disks are like 5 MB/s). I just realized because our timeouts in our tests that have been running for a lot of previous releases, so I'm guessing is an issue with the latests build since it's failing now.

Version-Release number of selected component (if applicable):
3.6.0-13
libvirt-python-1.2.17-2.el7.x86_64
libvirt-daemon-driver-nwfilter-1.2.17-9.el7.x86_64
libvirt-daemon-config-network-1.2.17-9.el7.x86_64
libvirt-client-1.2.17-9.el7.x86_64
libvirt-daemon-driver-secret-1.2.17-9.el7.x86_64
libvirt-daemon-1.2.17-9.el7.x86_64
libvirt-daemon-driver-interface-1.2.17-9.el7.x86_64
libvirt-daemon-config-nwfilter-1.2.17-9.el7.x86_64
libvirt-daemon-kvm-1.2.17-9.el7.x86_64
libvirt-daemon-driver-network-1.2.17-9.el7.x86_64
libvirt-daemon-driver-nodedev-1.2.17-9.el7.x86_64
libvirt-daemon-driver-lxc-1.2.17-9.el7.x86_64
libvirt-lock-sanlock-1.2.17-9.el7.x86_64
libvirt-daemon-driver-storage-1.2.17-9.el7.x86_64
libvirt-daemon-driver-qemu-1.2.17-9.el7.x86_64
libvirt-1.2.17-9.el7.x86_64
vdsm-python-4.17.7-1.el7ev.noarch
vdsm-4.17.7-1.el7ev.noarch
qemu-img-rhev-2.3.0-24.el7.x86_64
qemu-kvm-common-rhev-2.3.0-24.el7.x86_64
qemu-kvm-tools-rhev-2.3.0-24.el7.x86_64
qemu-kvm-rhev-2.3.0-24.el7.x86_64
hosts are rhel 7.2

How reproducible:
100%

Steps to Reproduce:
1. Create a vm with an OS install (RHEL 6.7 with guest agent) - shut it down
2. Create a disk and attach it with IDE - sparse - 10 Gb provisioned (either cow or raw/ nfs or iscsi)
3. Start the vm
4. Create a partition - create an 1GB ext4 fs - try to dd data...

Actual results:
create an 1GB ext4 partition takes like 3 minutes... dd shows 861 kB/s and those numbers are not remotely the same for other disk configuration in the same storage domain, and it use to be much faster in previous builds.


Additional info:

Comment 1 Carlos Mestre González 2015-09-23 13:06:21 UTC

Created attachment 1076232 [details]
qemu log..

Comment 2 Carlos Mestre González 2015-09-23 13:07:54 UTC

Adding to storage whiteboard for the moment. Seems fairly reproducible so I'm adding the qemu log just in case, couldn't fine any errors in other logs.

Comment 3 Carlos Mestre González 2015-09-23 14:11:38 UTC

The only error I've seen in the engine.log is this one, not sure if relevant:

2015-09-23 13:40:07,046 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler
_Worker-77) [a013bb9] VM '071562dc-591c-4c5d-8ee0-644bb51fe820' managed non pluggable device was remo
ved unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='c2d7a067-55a9-4e9b-a5c6-e516a3efb
f15', vmId='071562dc-591c-4c5d-8ee0-644bb51fe820'}', device='spice', type='GRAPHICS', bootOrder='0', 
specParams='[]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='', custo
mProperties='[]', snapshotId='null', logicalName='null', usingScsiReservation='false'}'

Comment 4 Yaniv Kaul 2015-09-24 08:28:39 UTC

Why is it a RHEV bug and not QEMU/KVM/libvirt?
do you suspect anything wrong in the way RHEV launches the VM?

5MB/s is also a joke. It should be 50-500MB/sec, depending on your storage.

Comment 5 Yaniv Kaul 2015-09-24 08:32:15 UTC

Few more questions:
1. There aren't clear instructions on how to reproduce the issue. Specifically, what is your storage server? 
2. I've noticed you are using a VM with 16 sockets? Is that on purpose? Can you try with 2 or so?
3. Why try to use ext4? just dd on the raw partition. What is your 'dd' command? Did you verify it's not running slowly? (did you look at ddpt for example?)

Comment 6 Yaniv Kaul 2015-09-24 08:36:36 UTC

(In reply to Yaniv Kaul from comment #5)
> Few more questions:
> 1. There aren't clear instructions on how to reproduce the issue.
> Specifically, what is your storage server? 
> 2. I've noticed you are using a VM with 16 sockets? Is that on purpose? Can
> you try with 2 or so?

Sorry, with 1 CPU. Please test with more. Also, why -cpu Nehalem ?

(again, I doubt any are related - you have a more severe issues - your whole IO is quite slow for some reason).


> 3. Why try to use ext4? just dd on the raw partition. What is your 'dd'
> command? Did you verify it's not running slowly? (did you look at ddpt for
> example?)

Comment 7 Allon Mureinik 2015-09-24 08:56:52 UTC

Also, with the same hardware, how does this stack up against oVirt 3.5's performance? 
Any noticeable difference?

Comment 8 Carlos Mestre González 2015-09-24 09:13:15 UTC

Just to clarify, our nfs storage server is really slow now (getting 8MB/s transfer rate with dd with virtio for example), the issue is that the performance with IDE is almost 10 times slower (<1 MB/s), probably if it wasn't that slow i wouldn't catch it up with the issues because of timeout in our tests since we don't test perfomance in general.

(In reply to Yaniv Kaul from comment #4)
> Why is it a RHEV bug and not QEMU/KVM/libvirt?
> do you suspect anything wrong in the way RHEV launches the VM?
> 
> 5MB/s is also a joke. It should be 50-500MB/sec, depending on your storage.

Normally I assign it to rhev so devel team can investigate first and assign it accordingly.

(In reply to Yaniv Kaul from comment #5)
> Few more questions:
> 1. There aren't clear instructions on how to reproduce the issue.
> Specifically, what is your storage server? 

I'm checking all the issues with our server now with the team, I'll update with a private comment.

> 3. Why try to use ext4? just dd on the raw partition. What is your 'dd'
> command? Did you verify it's not running slowly? (did you look at ddpt for
> example?)

ext4 is part of our test suite, I just checked the dd command to see the speed.

dd if=/dev/zero of=test2 bs=1M count=100
haven't check ddpt, I'll look.

Comment 9 Yaniv Kaul 2015-09-24 09:33:55 UTC

1. Please fix your storage server. No point in testing with such issues. (Make sure your network connection is not 100Mbps - that can explain some of it).
2. RHEV devel will be lacking a lot of data here. Especially around the QEMU/KVM issues. I don't see what RHEV has to do with this ATM.
3. You are missing the flag to perform direct IO on the 'dd' command. Without it, you might be writing into cache. 100M is not a lot. You need to bypass the VM cache. Why not use 'fio' or some other reasonable tool? Note that with some storages (XtremIO for example), writing zero's doesn't write anything at all, so again you are 'cheating' sort of speak.

Comment 10 Carlos Mestre González 2015-09-24 16:54:51 UTC

Tested in another env with the same rhevm build and I don't see any performance issue with IDE disks. Also from my quick test on 3.5 it seems to have no issues there neither.

Not sure what is happening in my environment, could be infrastructure or the fact the nodes are hosted engine(?). anyway closing this bug and opening it again if I can manage to get a clear picture of what is going on.

Note You need to log in before you can comment on or make changes to this bug.