Bug 861425

Summary: qemu-img should discard empty blocks on block devices
Product: Red Hat Enterprise Linux 7 Reporter: Neil Wilson <neil>
Component: qemu-kvm-rhevAssignee: Kevin Wolf <kwolf>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: areis, huding, jen, juzhang, kwolf, mkenneth, mzhan, neil, rbalakri, rpacheco, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-31 08:36:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Neil Wilson 2012-09-28 15:09:23 UTC
Description of problem:

When qemu-img copies a sparse image file (qcow, raw, etc) onto a thinly provisioned LVM block device, it allocates all the sectors for the virtual size of the disk. 


Version-Release number of selected component (if applicable):

Version     : 0.12.1.2
Release     : 2.295.el6_3.2

How reproducible:




Steps to Reproduce:
1. Create a qcow or raw sparse Virtual machine image on the normal filesystem
2. Create a thinly provisioned LVM partition  e.g lvcreate --thin servers/mysql_thin --virtualsize 50G --name srv-msql3-vol
3. Note the block usage of the VM image on the normal filesystem with ls -lsh (in my case 2.3G on a 10G virtual size).
4. Copy the image to the partition with qemu-img: qemu-img convert -p /var/lib/libvirt/images/srv-testb-vol.raw -O host_device /dev/servers/srv-msql3-vol
5. Run 'lvs' and note the Origin Data size of the lvm partition  
  
Actual results:

the Origin Data size is 20% of a 50G thin pool, ie 10G


Expected results:

the Origin Data size should be neared 2.3G (ie 4.6% of a 50G thin pool).


Additional info:

You get the same effect with 'cp --sparse=always'.

Comment 1 Neil Wilson 2012-09-28 15:12:01 UTC
You get the same effect using '-O raw' on the qemu-img command line.

There is an incomplete patch on the Qemu mailing list to implement BLKDISCARD support on host devices:

http://lists.gnu.org/archive/html/qemu-devel/2011-11/msg01659.html

Comment 3 Neil Wilson 2012-09-28 15:30:48 UTC
Note this requires a kernel with dm-thin discard support enabled.

Comment 4 Neil Wilson 2012-09-28 15:34:40 UTC
See also: https://bugzilla.redhat.com/show_bug.cgi?id=835622

Comment 6 Ademar Reis 2013-05-17 15:25:32 UTC
Neil: Thanks for taking the time to enter a bug report with us. We appreciate
the feedback and look to use reports such as this to guide our efforts at
improving our products. That being said, we're not able to  guarantee the
timeliness or suitability of a resolution for issues entered here because this
is not a mechanism for requesting support.

If this issue is critical or in any way time sensitive, please raise a ticket
through your regular Red Hat support channels to make certain  it receives the
proper attention and prioritization to assure a timely resolution.

For information on how to contact the Red Hat production support team, please
visit: https://www.redhat.com/support/process/production/#howto

We'll target a fix upstream, which will probably be included in RHEL7.

Comment 9 Kevin Wolf 2014-07-17 13:44:58 UTC
I believe this works since upstream qemu 2.0 if the option '-t none' is used
for 'qemu-img convert'. The next rebase will take care of it then. It looks as
if the BLKDISCARD ioctl doesn't work reliably without O_DIRECT, so we can't use
it for other cache modes.

Neil, can you please try with a current upstream qemu if it behaves as
you intended when you reported this?

Comment 11 huiqingding 2014-08-13 02:26:02 UTC
I test qemu-img-rhev-2.1.0-1.el7.x86_64 using the steps of comment 4.

Before run "qemu-img convert",
1. check the raw image, virtual size is 10G and disk size is 2.5G:
# qemu-img info test.img 
image: test.img
file format: raw
virtual size: 10G (10737418240 bytes)
disk size: 2.5G
# ls -lsh test.img 
2.6G -rw-r--r--. 1 root root 10G Aug 13 10:10 test.img

2. check the thinly provisioned LVM partition,  Origin Data% is 0.00
# lvs
  LV            VG              Attr       LSize   Pool       Origin Data%  Move Log Cpy%Sync Convert
  home          rhel_dhcp-8-248 -wi-ao---- 407.50g                                                   
  root          rhel_dhcp-8-248 -wi-ao----  50.00g                                                   
  swap          rhel_dhcp-8-248 -wi-ao----   7.77g                                                   
  mysql_thin    servers         twi-a-tz--  30.00g                     0.00                          
  srv-msql3-vol servers         Vwi-a-tz--  30.00g mysql_thin          0.00   

3. copy the image to the thinly provisioned LVM partition with "-t none"]
# qemu-img convert -p test.img -O raw -t none /dev/servers/srv-msql3-vol
    (100.00/100%)

4. check the thinly provisioned LVM partition
# lvs
  LV            VG              Attr       LSize   Pool       Origin Data%  Move Log Cpy%Sync Convert
  home          rhel_dhcp-8-248 -wi-ao---- 407.50g                                                   
  root          rhel_dhcp-8-248 -wi-ao----  50.00g                                                   
  swap          rhel_dhcp-8-248 -wi-ao----   7.77g                                                   
  mysql_thin    servers         twi-a-tz--  30.00g                    33.33                          
  srv-msql3-vol servers         Vwi-a-tz--  30.00g mysql_thin         33.33       

after step4, the Origin Data% is 33.3% of a 30G thin pool, it is about 10G, but it should be about 2.5G.

Comment 12 huiqingding 2014-08-13 02:27:40 UTC
Based on the result of comment 11, I modify the status to "ASSIGNED". If I was wrong, please fix me.

Comment 15 Jeff Nelson 2015-11-05 15:34:33 UTC
Comment 9 suggests that the problem was fixed in upstream QEMU 2.0, but testing results in comment 11 reports that the problem still exists.

Meanwhile, rebasing to QEMU 2.0 has occurred, so the value in Fixed In Version is no longer relevant. Therefore, I'm clearing the Fixed In Version field. Please let me know if you have any questions.

Comment 16 Kevin Wolf 2016-08-05 16:41:57 UTC
I tried to reproduce the failure on a current QEMU version, and I could indeed
see that the full space was taken. On a closer look I saw that QEMU issued a
BLKDISCARDZEROES ioctl, i.e. it asked the kernel whether the device supported
zeroing data with BLKDISCARD, and the answer was no. So that's why it had to
write all of the data explicitly.

I'm not sure what needs to be done so that a thin LV actually advertises that
it supports this (if it's possible at all), but can you please check "blockdev
--getdiscardzeroes" for your LV to see if that's the same problem for you?

If so, then this isn't a QEMU problem because QEMU can only make use of
BLKDISCARD if that guarantees to zero out the image.

Comment 17 Kevin Wolf 2016-08-31 08:36:01 UTC
Almost four weeks without an answer and the 7.3 cycle is nearing its end, so
I'll just assume that my suspicion is right and close the bug. If you
eventually find out that your scenario is different, please reopen.