Bug 715017

Summary: Report disk latency (read and write) for each storage device
Product: Red Hat Enterprise Linux 6 Reporter: Andrew Cathrow <acathrow>
Component: qemu-kvmAssignee: chellwig <chellwig>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 6.2CC: bsarathy, cpelland, ehabkost, gjansen, iheim, juzhang, mkenneth, mshao, rbalakri, shuang, tburke, virt-maint, wdai
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.189.el6 Doc Type: Bug Fix
Doc Text:
Cause: no mechanism to report block device read and write latency. Consequence: management system can't report what is the average read and write latency for block devices of virtual machines. Change: implemented a mechanism so that qemu-kvm report disk latency stats on "info blockstats" Result: management system can fetch and use those stats from qemu-kvm.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 15:52:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 693510, 715019, 720379, 740918, 743047    

Comment 3 Geert Jansen 2011-07-18 12:32:53 UTC
We need this done on the hypervisor level, an agent is not an option.

Let me try to elaborate on what we need. We need to know the *average* read and write latencies for IO requests. I realize that some requests will be fast, and others slow, due to seeks and other contention. But we only need an average number.

One, reasonably efficient, way to do this is as follows. Introduce 4 new counters in QEmu:

 - Total read latency, ms
 - Total write latency, ms
 - Total number of read requests
 - Total number of write requests

These are just counters, i.e. monotonically increasing integers (presumably 64 bit). Every time a request starts, its associated start time needs to be recorded as part of the request metadata. When the request is done, calculate the total time it took and add this value to the latency counter. Also increase the request counter by one.

VDSM can then periodically sample these counters. Using a simple finite difference arithmetic, it finds the average read and write read latencies over the last sample interval.

Two other important design criteria:

 * Measure as close to the guest as possible. We want to know what latency the guest is experiencing.
 * Merged IO requests should ideally be accounted for by using a factor of N (number of merged requests) when updating both the latency as well as the no. of request counter.

Comment 4 Andrew Cathrow 2011-07-18 12:40:09 UTC
And according to Christoph we should add the flush counter too

Comment 5 Dor Laor 2011-07-18 13:22:46 UTC
Christoph, Geert has a vlaid point that we need this on the hypervisor side - we can't integrate our measurements with every guest OS that is out there. The hypervisor management needs a simple mechanism that will work.

Comment 6 Itamar Heim 2011-07-26 08:08:16 UTC
what's the planned resolution for latency and how is it aggregated for averages?

Comment 7 chellwig@redhat.com 2011-07-26 20:08:53 UTC
I now have code that basically implements the suggestion in comment 3.  It'll need a bit more polishing, but I should be able to send it out to the list tomorrow.  I'm not sure the reception of such a raw and more or less useless for qemu interface will be overly nice.

Comment 10 Suqin Huang 2011-08-17 02:28:18 UTC
Hi chellwig,
For QE part, we only need to check if latency info is provided though qemu monitor and QMP? is there anything others we need to test.

Thanks 
Suqin

Comment 12 Suqin Huang 2011-08-30 03:32:39 UTC
will also run O_DIRECT reads/writes on raw device in guest to check if qemu report the correct data.

Comment 19 Suqin Huang 2011-09-22 08:54:41 UTC
Hi chellwig,

I test with qemu-kvm-0.12.1.2-2.190.el6.x86_64, some data reported by qemu is incorrect.

1. blk: virtio
2. test scenario: write/read with dd, get blk info before and after dd, and then subtract data before_dd from after_dd

3. write operation 
1). # dd if=/dev/zero of=/mnt/test bs=1M count=1000 
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 7.31712 s, 143 MB/s

2). result
a. wr_operations is incorrect:

actual result: wr_operations: 9062 - 6976 = 2086
expected result: around 1000

b. wr_total_time_ns is incorrect:

actual result: wr_total_time_ns=874167049027 - 667511125049 = 206.655923978s
expected result: around 7.31712 s

c. wr_bytes is acceptable:
actual result: 4198605824 - 3149878272 = 1048727552
expected result: around 1048576000

3). detail data:

before dd:

drive-virtio-disk1: rd_bytes=199632384 wr_bytes=3149878272 rd_operations=5131 wr_operations=6976 flush_operations=136 wr_total_time_ns=667511125049 rd_total_time_ns=7999186637 flush_total_time_ns=150010121

after dd:

drive-virtio-disk1: rd_bytes=199636480 wr_bytes=4198605824 rd_operations=5132 wr_operations=9062 flush_operations=138 wr_total_time_ns=874167049027 rd_total_time_ns=8002082444 flush_total_time_ns=172090043


4. read operation:
1). # dd if=/mnt/test of=/dev/null 
2048000+0 records in
2048000+0 records out
1048576000 bytes (1.0 GB) copied, 2.54448 s, 412 MB/s

2). result

a. rd_bytes

actual result: rd_bytes = 713486848 - 666972672 = 46514176
expected result: around 1048576000

b. rd_operation:
actual result: 36777 - 36219 = 558
expected result: 1000

c. rd_total_time_ns:
actual result: 55262579137 - 53854678137 = 1.407901000 s
expected result: around 2.54448 s

3). detail data
before dd:

drive-virtio-disk1: rd_bytes=666972672 wr_bytes=4565480448 rd_operations=36219 wr_operations=12882 flush_operations=233 wr_total_time_ns=994495533746 rd_total_time_ns=53854678137 flush_total_time_ns=358894900


after dd:

drive-virtio-disk1: rd_bytes=713486848 wr_bytes=4640609280 rd_operations=36777 wr_operations=13960 flush_operations=235 wr_total_time_ns=1020617582408 rd_total_time_ns=55262579137 flush_total_time_ns=381067762

5. ide1-cd0
I didn't attach cd, but there is rd data for ide1-cd0

ide1-cd0: rd_bytes=172 wr_bytes=0 rd_operations=5 wr_operations=0 flush_operations=0 wr_total_time_ns=0 rd_total_time_ns=85268 flush_total_time_ns=0


6. detail cmd

/usr/libexec/qemu-kvm -monitor stdio -drive file=/home/RHEL-Server-6.2-32-virtio.qcow2,index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idOMdpQo,mac=0a:b3:b1:f0:9f:19,id=ndev00idOMdpQo,bus=pci.0,addr=0x3 -netdev tap,id=idOMdpQo,vhost=on,script='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :1 -rtc base=utc,clock=host,driftfix=none -M rhel6.2.0 -boot order=cdn,once=c,menu=off   -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm

Comment 20 Suqin Huang 2011-09-27 09:00:26 UTC
wr_bytes and rd_bytes reported by qemu are the same as dd report,
but operation and time are not the same sometime, 

Christoph , kevin:

is it acceptable? 

Run w/r on attached disk, both ide and virtio

qemu-kvm-0.12.1.2-2.193.el6.x86_64

I. ide:
1. write operation:

[root@dhcp-66-83-25 ~]# dd if=/dev/zero of=/dev/sdb bs=1M count=1024 oflag=direct

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 14.9952 s, 71.6 MB/s


reported by qemu:

wr_bytes: 1073741824  ----> the same

time: 14328843120   ----> almost the same

operation: 2048  ----> the same


2. read operation:

[root@dhcp-66-83-25 ~]# dd if=/dev/sdb of=/dev/null bs=1M

10240+0 records in

10240+0 records out

10737418240 bytes (11 GB) copied, 33.0387 s, 325 MB/s

reported by qemu:

rd_bytes: 10737418240  ----> the same

time: 12.936062195   ----> less than dd 

operation: 131388    ----> almost 5 times than dd


II. virtio:
1. write operation:

[root@localhost ~]# dd if=/dev/zero of=/dev/vdb bs=1M count=1024 oflag=direct

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 10.9087 s, 98.4 MB/s


reported by qemu:

wr_bytes: 1073741824  ----> the same

operation: 3072    ----> more than dd, should be 2048

time: 31.322404716   ----> more than dd


2. read operation

[root@localhost ~]# dd if=/dev/vdb of=/dev/null bs=1M

10240+0 records in

10240+0 records out

10737418240 bytes (11 GB) copied, 15.2586 s, 704 MB/s

rd_bytes: 10737418240  ----> the same

operation: 109955    ----> almost 5 times than dd

time: 14.287022666    ----> almost the same

Comment 21 Suqin Huang 2011-09-29 09:47:08 UTC
According to Christoph's comment, this bug is fixed.

" If you really want to make sure the I/O operations are correct we'd have
to write a kernel level test driver to issue I/Os 100% under our
control.  But the numbers really look like what I expected. "

So closing this bug

Comment 23 Eduardo Habkost 2011-11-22 13:16:25 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: no mechanism to report block device read and write latency.

Consequence: management system can't report what is the average read and write latency for block devices of virtual machines.

Change: implemented a mechanism so that qemu-kvm report disk latency stats on "info blockstats"

Result: management system can fetch and use those stats from qemu-kvm.

Comment 24 errata-xmlrpc 2011-12-06 15:52:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1531.html