Red Hat Bugzilla – Bug 715017
Report disk latency (read and write) for each storage device
Last modified: 2014-09-07 18:54:07 EDT
We need this done on the hypervisor level, an agent is not an option. Let me try to elaborate on what we need. We need to know the *average* read and write latencies for IO requests. I realize that some requests will be fast, and others slow, due to seeks and other contention. But we only need an average number. One, reasonably efficient, way to do this is as follows. Introduce 4 new counters in QEmu: - Total read latency, ms - Total write latency, ms - Total number of read requests - Total number of write requests These are just counters, i.e. monotonically increasing integers (presumably 64 bit). Every time a request starts, its associated start time needs to be recorded as part of the request metadata. When the request is done, calculate the total time it took and add this value to the latency counter. Also increase the request counter by one. VDSM can then periodically sample these counters. Using a simple finite difference arithmetic, it finds the average read and write read latencies over the last sample interval. Two other important design criteria: * Measure as close to the guest as possible. We want to know what latency the guest is experiencing. * Merged IO requests should ideally be accounted for by using a factor of N (number of merged requests) when updating both the latency as well as the no. of request counter.
And according to Christoph we should add the flush counter too
Christoph, Geert has a vlaid point that we need this on the hypervisor side - we can't integrate our measurements with every guest OS that is out there. The hypervisor management needs a simple mechanism that will work.
what's the planned resolution for latency and how is it aggregated for averages?
I now have code that basically implements the suggestion in comment 3. It'll need a bit more polishing, but I should be able to send it out to the list tomorrow. I'm not sure the reception of such a raw and more or less useless for qemu interface will be overly nice.
Hi chellwig, For QE part, we only need to check if latency info is provided though qemu monitor and QMP? is there anything others we need to test. Thanks Suqin
will also run O_DIRECT reads/writes on raw device in guest to check if qemu report the correct data.
Hi chellwig, I test with qemu-kvm-0.12.1.2-2.190.el6.x86_64, some data reported by qemu is incorrect. 1. blk: virtio 2. test scenario: write/read with dd, get blk info before and after dd, and then subtract data before_dd from after_dd 3. write operation 1). # dd if=/dev/zero of=/mnt/test bs=1M count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 7.31712 s, 143 MB/s 2). result a. wr_operations is incorrect: actual result: wr_operations: 9062 - 6976 = 2086 expected result: around 1000 b. wr_total_time_ns is incorrect: actual result: wr_total_time_ns=874167049027 - 667511125049 = 206.655923978s expected result: around 7.31712 s c. wr_bytes is acceptable: actual result: 4198605824 - 3149878272 = 1048727552 expected result: around 1048576000 3). detail data: before dd: drive-virtio-disk1: rd_bytes=199632384 wr_bytes=3149878272 rd_operations=5131 wr_operations=6976 flush_operations=136 wr_total_time_ns=667511125049 rd_total_time_ns=7999186637 flush_total_time_ns=150010121 after dd: drive-virtio-disk1: rd_bytes=199636480 wr_bytes=4198605824 rd_operations=5132 wr_operations=9062 flush_operations=138 wr_total_time_ns=874167049027 rd_total_time_ns=8002082444 flush_total_time_ns=172090043 4. read operation: 1). # dd if=/mnt/test of=/dev/null 2048000+0 records in 2048000+0 records out 1048576000 bytes (1.0 GB) copied, 2.54448 s, 412 MB/s 2). result a. rd_bytes actual result: rd_bytes = 713486848 - 666972672 = 46514176 expected result: around 1048576000 b. rd_operation: actual result: 36777 - 36219 = 558 expected result: 1000 c. rd_total_time_ns: actual result: 55262579137 - 53854678137 = 1.407901000 s expected result: around 2.54448 s 3). detail data before dd: drive-virtio-disk1: rd_bytes=666972672 wr_bytes=4565480448 rd_operations=36219 wr_operations=12882 flush_operations=233 wr_total_time_ns=994495533746 rd_total_time_ns=53854678137 flush_total_time_ns=358894900 after dd: drive-virtio-disk1: rd_bytes=713486848 wr_bytes=4640609280 rd_operations=36777 wr_operations=13960 flush_operations=235 wr_total_time_ns=1020617582408 rd_total_time_ns=55262579137 flush_total_time_ns=381067762 5. ide1-cd0 I didn't attach cd, but there is rd data for ide1-cd0 ide1-cd0: rd_bytes=172 wr_bytes=0 rd_operations=5 wr_operations=0 flush_operations=0 wr_total_time_ns=0 rd_total_time_ns=85268 flush_total_time_ns=0 6. detail cmd /usr/libexec/qemu-kvm -monitor stdio -drive file=/home/RHEL-Server-6.2-32-virtio.qcow2,index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idOMdpQo,mac=0a:b3:b1:f0:9f:19,id=ndev00idOMdpQo,bus=pci.0,addr=0x3 -netdev tap,id=idOMdpQo,vhost=on,script='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :1 -rtc base=utc,clock=host,driftfix=none -M rhel6.2.0 -boot order=cdn,once=c,menu=off -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm
wr_bytes and rd_bytes reported by qemu are the same as dd report, but operation and time are not the same sometime, Christoph , kevin: is it acceptable? Run w/r on attached disk, both ide and virtio qemu-kvm-0.12.1.2-2.193.el6.x86_64 I. ide: 1. write operation: [root@dhcp-66-83-25 ~]# dd if=/dev/zero of=/dev/sdb bs=1M count=1024 oflag=direct 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 14.9952 s, 71.6 MB/s reported by qemu: wr_bytes: 1073741824 ----> the same time: 14328843120 ----> almost the same operation: 2048 ----> the same 2. read operation: [root@dhcp-66-83-25 ~]# dd if=/dev/sdb of=/dev/null bs=1M 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 33.0387 s, 325 MB/s reported by qemu: rd_bytes: 10737418240 ----> the same time: 12.936062195 ----> less than dd operation: 131388 ----> almost 5 times than dd II. virtio: 1. write operation: [root@localhost ~]# dd if=/dev/zero of=/dev/vdb bs=1M count=1024 oflag=direct 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 10.9087 s, 98.4 MB/s reported by qemu: wr_bytes: 1073741824 ----> the same operation: 3072 ----> more than dd, should be 2048 time: 31.322404716 ----> more than dd 2. read operation [root@localhost ~]# dd if=/dev/vdb of=/dev/null bs=1M 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 15.2586 s, 704 MB/s rd_bytes: 10737418240 ----> the same operation: 109955 ----> almost 5 times than dd time: 14.287022666 ----> almost the same
According to Christoph's comment, this bug is fixed. " If you really want to make sure the I/O operations are correct we'd have to write a kernel level test driver to issue I/Os 100% under our control. But the numbers really look like what I expected. " So closing this bug
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: no mechanism to report block device read and write latency. Consequence: management system can't report what is the average read and write latency for block devices of virtual machines. Change: implemented a mechanism so that qemu-kvm report disk latency stats on "info blockstats" Result: management system can fetch and use those stats from qemu-kvm.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1531.html