Description of problem: Seems IDE disks are really slow (either RAW or COW/NFS or ISCSI). I've tried to add a secondary disks to a vm, and creating an 1GB ext4 fs on the disks takes like 3 minutes to complete (taking a few secs for virtio or virtio-scsi), and dd speed is 861 kB/s (other virtio disks are like 5 MB/s). I just realized because our timeouts in our tests that have been running for a lot of previous releases, so I'm guessing is an issue with the latests build since it's failing now. Version-Release number of selected component (if applicable): 3.6.0-13 libvirt-python-1.2.17-2.el7.x86_64 libvirt-daemon-driver-nwfilter-1.2.17-9.el7.x86_64 libvirt-daemon-config-network-1.2.17-9.el7.x86_64 libvirt-client-1.2.17-9.el7.x86_64 libvirt-daemon-driver-secret-1.2.17-9.el7.x86_64 libvirt-daemon-1.2.17-9.el7.x86_64 libvirt-daemon-driver-interface-1.2.17-9.el7.x86_64 libvirt-daemon-config-nwfilter-1.2.17-9.el7.x86_64 libvirt-daemon-kvm-1.2.17-9.el7.x86_64 libvirt-daemon-driver-network-1.2.17-9.el7.x86_64 libvirt-daemon-driver-nodedev-1.2.17-9.el7.x86_64 libvirt-daemon-driver-lxc-1.2.17-9.el7.x86_64 libvirt-lock-sanlock-1.2.17-9.el7.x86_64 libvirt-daemon-driver-storage-1.2.17-9.el7.x86_64 libvirt-daemon-driver-qemu-1.2.17-9.el7.x86_64 libvirt-1.2.17-9.el7.x86_64 vdsm-python-4.17.7-1.el7ev.noarch vdsm-4.17.7-1.el7ev.noarch qemu-img-rhev-2.3.0-24.el7.x86_64 qemu-kvm-common-rhev-2.3.0-24.el7.x86_64 qemu-kvm-tools-rhev-2.3.0-24.el7.x86_64 qemu-kvm-rhev-2.3.0-24.el7.x86_64 hosts are rhel 7.2 How reproducible: 100% Steps to Reproduce: 1. Create a vm with an OS install (RHEL 6.7 with guest agent) - shut it down 2. Create a disk and attach it with IDE - sparse - 10 Gb provisioned (either cow or raw/ nfs or iscsi) 3. Start the vm 4. Create a partition - create an 1GB ext4 fs - try to dd data... Actual results: create an 1GB ext4 partition takes like 3 minutes... dd shows 861 kB/s and those numbers are not remotely the same for other disk configuration in the same storage domain, and it use to be much faster in previous builds. Additional info:
Created attachment 1076232 [details] qemu log..
Adding to storage whiteboard for the moment. Seems fairly reproducible so I'm adding the qemu log just in case, couldn't fine any errors in other logs.
The only error I've seen in the engine.log is this one, not sure if relevant: 2015-09-23 13:40:07,046 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler _Worker-77) [a013bb9] VM '071562dc-591c-4c5d-8ee0-644bb51fe820' managed non pluggable device was remo ved unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='c2d7a067-55a9-4e9b-a5c6-e516a3efb f15', vmId='071562dc-591c-4c5d-8ee0-644bb51fe820'}', device='spice', type='GRAPHICS', bootOrder='0', specParams='[]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='', custo mProperties='[]', snapshotId='null', logicalName='null', usingScsiReservation='false'}'
Why is it a RHEV bug and not QEMU/KVM/libvirt? do you suspect anything wrong in the way RHEV launches the VM? 5MB/s is also a joke. It should be 50-500MB/sec, depending on your storage.
Few more questions: 1. There aren't clear instructions on how to reproduce the issue. Specifically, what is your storage server? 2. I've noticed you are using a VM with 16 sockets? Is that on purpose? Can you try with 2 or so? 3. Why try to use ext4? just dd on the raw partition. What is your 'dd' command? Did you verify it's not running slowly? (did you look at ddpt for example?)
(In reply to Yaniv Kaul from comment #5) > Few more questions: > 1. There aren't clear instructions on how to reproduce the issue. > Specifically, what is your storage server? > 2. I've noticed you are using a VM with 16 sockets? Is that on purpose? Can > you try with 2 or so? Sorry, with 1 CPU. Please test with more. Also, why -cpu Nehalem ? (again, I doubt any are related - you have a more severe issues - your whole IO is quite slow for some reason). > 3. Why try to use ext4? just dd on the raw partition. What is your 'dd' > command? Did you verify it's not running slowly? (did you look at ddpt for > example?)
Also, with the same hardware, how does this stack up against oVirt 3.5's performance? Any noticeable difference?
Just to clarify, our nfs storage server is really slow now (getting 8MB/s transfer rate with dd with virtio for example), the issue is that the performance with IDE is almost 10 times slower (<1 MB/s), probably if it wasn't that slow i wouldn't catch it up with the issues because of timeout in our tests since we don't test perfomance in general. (In reply to Yaniv Kaul from comment #4) > Why is it a RHEV bug and not QEMU/KVM/libvirt? > do you suspect anything wrong in the way RHEV launches the VM? > > 5MB/s is also a joke. It should be 50-500MB/sec, depending on your storage. Normally I assign it to rhev so devel team can investigate first and assign it accordingly. (In reply to Yaniv Kaul from comment #5) > Few more questions: > 1. There aren't clear instructions on how to reproduce the issue. > Specifically, what is your storage server? I'm checking all the issues with our server now with the team, I'll update with a private comment. > 3. Why try to use ext4? just dd on the raw partition. What is your 'dd' > command? Did you verify it's not running slowly? (did you look at ddpt for > example?) ext4 is part of our test suite, I just checked the dd command to see the speed. dd if=/dev/zero of=test2 bs=1M count=100 haven't check ddpt, I'll look.
1. Please fix your storage server. No point in testing with such issues. (Make sure your network connection is not 100Mbps - that can explain some of it). 2. RHEV devel will be lacking a lot of data here. Especially around the QEMU/KVM issues. I don't see what RHEV has to do with this ATM. 3. You are missing the flag to perform direct IO on the 'dd' command. Without it, you might be writing into cache. 100M is not a lot. You need to bypass the VM cache. Why not use 'fio' or some other reasonable tool? Note that with some storages (XtremIO for example), writing zero's doesn't write anything at all, so again you are 'cheating' sort of speak.
Tested in another env with the same rhevm build and I don't see any performance issue with IDE disks. Also from my quick test on 3.5 it seems to have no issues there neither. Not sure what is happening in my environment, could be infrastructure or the fact the nodes are hosted engine(?). anyway closing this bug and opening it again if I can manage to get a clear picture of what is going on.