Bug 750202

Summary: Completely different (and bogus) performance for two identical virtual disks
Product: [Fedora] Fedora Reporter: Dennis Jacobfeuerborn <dennisml>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED CANTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 15CC: amit.shah, berrange, crobinso, dougsland, dwmw2, ehabkost, itamar, jaswinder, jforbes, knoel, kwolf, rjones, scottt.tw, tburke, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-29 12:37:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirt log excerpt none

Description Dennis Jacobfeuerborn 2011-10-31 10:46:44 UTC
I've set up a CentOS 6 guest that has two virtual disks that show a completely different (and bogus) behavior even though they are configured exactly the same.

Host configuration:
CentOS 6 guest on Fedora 15 host using KVM
Two identical 1TB SATA SAMSUNG HD103SJ host disks (/dev/sdb and /dev/sdc) used for the volume images for the two guest disks. One separate 128GB SSD drive (/dev/sda) for the Fedora System (i.e. the SATA disks are not used by the host system itself).

Guest configuration:
/dev/vda and /dev/vdb are each 2gb in size and each placed on it's own SATA host disk mentioned above as raw fully allocated image files. The guest disks are attached using virtio and cache=none.

Using hdparm and seekmark I get the following results:

seekmark:
/dev/vda:  130 seeks/s
/dev/vdb: 9615 seeks/s

hdparm -t:
/dev/vda:   95 MB/s
/dev/vdb: 1691 MB/s

/dev/vda looks ok but the numbers for /dev/vdb make no sense at all. When I test /dev/vda I can hear the drive work and see the I/O on the host. When doing the same with /dev/vdb both tests finish almost immediately and I don't see any I/O on the host side.

Testing the physical disks on the hosts show the following numbers:
/dev/sdb: 74 seeks/s
/dev/sdc: 73 seeks/s

hdparm -t:
/dev/sdb: 145 MB/s
/dev/sdc: 138 MB/s

As you can see the physical drives themselves behave pretty identical as expected.

This bug is not about the actual absolute performance of the virtual disks but the bogus behavior that I get event though the drives are identically configured.

Some mount settings for the backing disks:
[dennis@nexus ~]$ cat /proc/mounts|grep backup
/dev/sdb3 /mnt/backup01 ext4 
rw,seclabel,relatime,user_xattr,acl,barrier=1,data=ordered 0 0
/dev/sdc3 /mnt/backup02 ext4 
rw,seclabel,relatime,user_xattr,acl,barrier=1,data=ordered 0 0

Image files:
[dennis@nexus seekmark]$ ls -l /mnt/backup01/libvirt/images/gw1.img /mnt/backup02/libvirt/images/gw1-data.img 
-rw-------. 1 root root 2097152000 Oct 26 14:07 /mnt/backup01/libvirt/images/gw1.img
-rw-------. 1 root root 2097152000 Oct 23 00:56 /mnt/backup02/libvirt/images/gw1-data.img

Disk definition in the guest:
...
     <disk type='file' device='disk'>
       <driver name='qemu' type='raw' cache='none'/>
       <source file='/mnt/backup01/libvirt/images/gw1.img'/>
       <target dev='vda' bus='virtio'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x05' 
function='0x0'/>
     </disk>
     <disk type='file' device='disk'>
       <driver name='qemu' type='raw' cache='none'/>
       <source file='/mnt/backup02/libvirt/images/gw1-data.img'/>
       <target dev='vdb' bus='virtio'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x07' 
function='0x0'/>
     </disk>
...

I brough this up on the centos-virt mailing list but didn't get any feedback which is why I'm opening this bug as I'm trying to benchmark various configuration settings but cannot do so reliably unless I can explain this behavior seen above.

Comment 1 Dennis Jacobfeuerborn 2011-10-31 10:50:50 UTC
I forgot to mention that I run the fedora virt-preview repo with the following package versions:

qemu-kvm-0.15.0-4.fc15.x86_64
libvirt-0.9.6-1.fc15.x86_64
virt-manager-0.9.0-6.fc15.noarch

Comment 2 Kevin Wolf 2011-10-31 15:47:35 UTC
It would be good to know what the qemu command line looks like. Can you provide that or maybe attach the libvirt log that should contain it as well?

Comment 3 Richard W.M. Jones 2011-10-31 15:47:50 UTC
The trouble with the test is that you're going through the
ext4 filesystem on the host.  There could easily be
fragmentation or alignment issues that affect the two files
differently.

I'd be interested to know:

- are the host partitions aligned?
  # parted /dev/sdb unit b print
  # parted /dev/sdb unit b print

- instead of using host files, use host LVs (or if you like,
  properly aligned host partitions, but LVs are more flexible)

- do the SATA disks have 512 byte or 4K sectors?
  http://libguestfs.org/virt-alignment-scan.1.html#linux_host_block_and_i_o_size

Comment 4 Dennis Jacobfeuerborn 2011-10-31 17:35:46 UTC
(In reply to comment #3)
> The trouble with the test is that you're going through the
> ext4 filesystem on the host.  There could easily be
> fragmentation or alignment issues that affect the two files
> differently.

Fragmentation does not explain the gigantic difference between the results.
The entire machine a just a few weeks old and the partitions the libvirt images reside on haven't seen almost no deletions and have 88% and 96% space free respectively so fragmentation should be pretty much non-existent.

> I'd be interested to know:
> 
> - are the host partitions aligned?
>   # parted /dev/sdb unit b print
>   # parted /dev/sdb unit b print

Since the drives have been partitioned identically they are identically aligned or unaligned so I would expect to see the same impact on performance good or bad:

[dennis@nexus ~]$ sudo parted /dev/sdb unit b print
Model: ATA SAMSUNG HD103SJ (scsi)
Disk /dev/sdb: 1000204886016B
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start          End             Size           Type     File system  Flags
 1      1048576B       419431448575B   419430400000B  primary  ntfs
 2      419431448576B  848928178175B   429496729600B  primary               raid
 3      848928178176B  1000204886015B  151276707840B  primary  ext4

[dennis@nexus ~]$ sudo parted /dev/sdc unit b print
Model: ATA SAMSUNG HD103SJ (scsi)
Disk /dev/sdc: 1000204886016B
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start          End             Size           Type     File system  Flags
 1      1048576B       419431448575B   419430400000B  primary  ntfs
 2      419431448576B  848928178175B   429496729600B  primary               raid
 3      848928178176B  1000204886015B  151276707840B  primary  ext4

> - instead of using host files, use host LVs (or if you like,
>   properly aligned host partitions, but LVs are more flexible)

I use LVs on my servers but here i use images because I don't care about performance (for now).

> - do the SATA disks have 512 byte or 4K sectors?

They have a 512 byte physical sector size but again since these are the same drives I'm not sure why this should matter.

I'm aware that image file vs. LV, aligned vs. non-aligned, 4k vs 512 byte sectors, etc. all have an impact on performance and having administrated server for a decade now I'm familiar will all kinds of I/O patterns that could influence these measurements from the sneaky background raid verify to the major database system that really pounds the disks but I have no explanation for the *magnitude* of difference I'm seeing here.

It might be the case that due to some form of caching the numbers are inflated or that because of the virtualization overhead the numbers are lower than on the host side but what I would expect in either case is that I would get the same result on both disks.

Likewise if one of the physical drives was damaged in some weird way then this could be the culprit but in that case I would expect to see the same difference in performance on the host side yet on the host side the drives behave identical as they should.

Lastly the fact that when running the tests on /dev/vda I can see I/O on the host side on /dev/sdb but when running them on /dev/vdb I can *not* see any I/O whatsoever on the host side makes me wonder what is going on:

seekmark -s 500 -f /dev/vda: (test takes about 5 seconds)
iostat on the Host:
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.00         0.00        11.20          0         56
sdb             100.00       902.40         4.80       4512         24
sdc               0.00         0.00         0.00          0          0

seekmark -s 500 -f /dev/vdb: (test return immediately)
iostat on the Host:
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.40         0.00        17.60          0         88
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0

Notice how in the first case I get 100tps which for a 500 seeks taking 5 seconds to run is exactly what I would expect. 
On /dev/vdb the host doesn't see an traffic at all!

Comment 5 Dennis Jacobfeuerborn 2011-10-31 17:38:29 UTC
Created attachment 531005 [details]
libvirt log excerpt

This is the libvirt log from the last run of the guest. It really just contains the command line.

Comment 6 Richard W.M. Jones 2011-10-31 18:12:39 UTC
I'm only trying to help here.

The way to understand the root cause of this problem is to
gradually remove elements of complexity until one single
change makes a difference.

In this case I think it would be helpful to eliminate
the host ext4 filesystem, and see if that makes any
difference.  Maybe it won't but you won't know until
you've tried it.

The partitions (/dev/sdb3, /dev/sdc3) are aligned, so I
would just put /dev/vda and /dev/vdb directly on these
partitions.

Comment 7 Dennis Jacobfeuerborn 2011-10-31 19:30:41 UTC
After moving all data off the partitions, dd'ing the image files onto them and reconfiguring the guest accordingly I now get the expected result and see about 95 seeks/s on both disks.

So I went ahead an reformatted the partitions and restored the files again. This time even with the files I got reasonable results.

Next I added a new disk /dev/vdc with an image file on host /dev/sdc3 (just like /dev/vdb). This disk showed the strange behavior again.

Lastly I copied the image file into a new file and then replaced the original image file with the copy and then the disk showed the same correct behavior as the other two.

Either the file is sparse even though I chose full allocation and "du -h" shows the file as using 2gb of space or the FS optimizes the I/O away because it sees the client accessing a block that is allocated but not yet initialized (due to ext4 pre-allocation) and as a result can simply hand back a block filled with zeroes without actually having to access the disk.

Comment 8 Fedora Admin XMLRPC Client 2012-03-15 17:54:24 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 9 Cole Robinson 2012-05-29 00:37:15 UTC
Dennis, are you still seeing this issue with a more recent Fedora? F15 is end of life in a month. If so, please comment to that effect and we can escalate from there.

Comment 10 Dennis Jacobfeuerborn 2012-05-29 11:13:27 UTC
No. As I mentioned in comment 7 this turned out to be a side effect of how ext4 allocates blocks so this isn't a real problem but just something to look out for when benchmarking anything on ext4 (i.e. ensuring that you create non-sparse files is not enough to make sure you get reasonable results. You actually have to write to all blocks of the files before starting the benchmarking).

Comment 11 Cole Robinson 2012-05-29 12:37:45 UTC
Sorry Dennis, I missed that detail. Closing this as CANTFIX then.