Bug 1759454 - libvirt (>4.3) makes virt-manager use 'preallocation=falloc' instead of the former mode 'metadata'
Summary: libvirt (>4.3) makes virt-manager use 'preallocation=falloc' instead of the f...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: virt-manager
Version: unspecified
Hardware: All
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Cole Robinson
QA Contact:
URL:
Whiteboard:
: 1789853 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-08 08:33 UTC by Christian Ehrhardt
Modified: 2020-09-20 18:06 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-20 18:06:24 UTC


Attachments (Terms of Use)

Description Christian Ehrhardt 2019-10-08 08:33:40 UTC
## Description of problem: ##
Hi,
in Ubuntu bug 1847105 [1] I think I have found a somewhat unexpected or unclear behavior mismatch between libvirt `>v4.3` and virt-manager.

For volume creation virt-manager since quite some time (I checked back until `v1.5`) will submit an XML like the following to libvirts volume creation:

<volume>
  <name>ubuntu19.04.qcow2</name>
  <capacity>16106127360</capacity>
  <allocation>16106127360</allocation>
  <target>
    <format type="qcow2"/>
    <features>
      <lazy_refcounts/>
    </features>
  </target>
</volume>

The important detail in regard to this issues is that capacity==allocation.
In the definition by libvirt [2] this is somewhat undefined. In my reading of it this case would also fully allocate the volume.

But - and here is the change in behavior - up until libvirt 'v4.3' it didn't.
It used to use `preallocation=metadata` until a change [3] made it issue `preallocation=falloc` nowadays.

eventual qemu-img call:
Old: `/usr/bin/qemu-img create -f qcow2 -o preallocation=metadata,compat=1.1,lazy_refcounts /var/lib/libvirt/images/ubuntu18.04.qcow2 15728640K`
New: `/usr/bin/qemu-img create -f qcow2 -o preallocation=falloc,compat=1.1,lazy_refcounts /var/lib/libvirt/images/live-server.qcow2 41943040K`



## Version-Release number of selected component (if applicable): ##
v1.5 - v2.2.1

## How reproducible: ##
100%

## Steps to Reproduce: ##
1. create a new guest (use PXE to not require any ISO)
2. when you hit "Finish" to start the installation it will issue a qemu-img command
3. with newer libvirt versions this will use preallocation=falloc

## Actual results: ##
qemu-img ... preallocation=falloc

## Expected results: ##
qemu-img ... preallocation=metadata (like in the past)
OR a statement that falloc is preferred by virt-manager as well and not just an accident due to a change in libvirt.


## Next steps: ##
I was now wondering about virt-managers position on this, do you:
 * want to modify virt-manager to get back to spare/lazy allocations
   (TL;DR submit an allocation value that is `0<allocation<capacity`?
 * consider this change a bug in libvirt, so that we should report/change
   it there? Maybe the comparison there should be `>=` and not just `>`
   in your opinion?
 * not changing anything as the `falloc` creation is preferred from the
   POV of virt-manager
 * some other way answer I didn't predict :-)


[1]: https://bugs.launchpad.net/ubuntu/+source/virt-manager/+bug/1847105
[2]: https://libvirt.org/formatstorage.html#StorageVol
[3]: https://libvirt.org/git/?p=libvirt.git;a=blobdiff;f=src/storage/storage_util.c;h=897dfdaaee4da967fb5dacc4327484f40b8aa717;hp=b4aed0f7009b8d7e81a64e4f4a5c920a422fbba4;hb=c9ec7088c7a3f4cd26bb471f1f243931fff6f4f9;hpb=c1bc9c662b411f8fa06071991921a180480a0f40

Comment 1 Christian Ehrhardt 2019-10-08 08:35:19 UTC
FYI: initially reported in https://github.com/virt-manager/virt-manager/issues/56 but then I realized that officially BZ is most likely the correct bug-tracker

Comment 2 Daniel Berrangé 2019-10-08 08:48:38 UTC
This behaviour changed due to this commit:

commit c9ec7088c7a3f4cd26bb471f1f243931fff6f4f9
Author: Wim ten Have <wim.ten.have@oracle.com>
Date:   Mon Apr 9 20:14:35 2018 +0200

    storage: extend preallocation flags support for qemu-img
    
    This patch adds support to qcow2 formatted filesystem object storage by
    instructing qemu-img to build them with preallocation=falloc whenever the
    XML described storage <allocation> matches its <capacity>.  For all other
    cases the filesystem stored objects are built with preallocation=metadata.
    
    Signed-off-by: Wim ten Have <wim.ten.have@oracle.com>
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>


This is a semantic change, so should not have been allowed.

It has been 1 & 1/2 years already, but I'm wondering if we should none the less revert it and instead add some other way to declare which preallocation strategy to use.

Comment 3 Christian Ehrhardt 2019-10-08 10:45:29 UTC
Thanks Daniel,
yes that is the patch I was referring to.

If you really think reverting it in libvirt is the right path I'm fine following you on that and keep virt-manager untouched in that regard.
But as you say it is in the field for quite a while and other than this report I haven't seen any.
Therefore I wasn't going to suggest a revert, after that much time it almost is a semantic change "again".

Furthermore the reasoning to add it back in [1] was with virt-inst / virt -manager in mind and the exact definition in [2] IMHO is a bit weak for this particular case.
That was the reason I asked for guidance from virt-managers POV first.

If we end up reverting the change we might consider modifying the text in [2] to be more clear what is (expected) to happen if allocation==capacity.

[1]: https://www.redhat.com/archives/libvir-list/2018-April/msg00130.html
[2]: https://libvirt.org/formatstorage.html#StorageVol

Comment 4 Daniel Berrangé 2019-10-08 12:20:29 UTC
Looking at the launchpad bug I see the original complaint was the disk creation was slow. This is somewhat surprising to me. falloc should be fast on any mainstream, modern filesystem as it merely has to write metadata in the FS to reserve the new blocks.  I don't see any report of what host filesystem was in use when the user saw > 1 minute disk allocation. It would be worth investigating that aspect more, because my expectation is that falloc should be negligible in time.

Comment 5 Christian Ehrhardt 2019-10-08 12:48:20 UTC
Hi Daniel,
yes falloc should be fast in many cases, but also isn't in others.

The reporter used this directly on a ZFS system.
I used to recreate the slowness on stacked containers which is EXT4 -> ZFS-image file -> LXD container -> libvirt.
So far it seems we can assume that ZFS is slow in this use case.

I agree that on ext4 and tmpfs this is <0.02 real time.
So to face the slowdown might have a ZFS component.

Comment 6 Daniel Berrangé 2019-10-08 12:53:28 UTC
Yes, it seems fallocate on ZFS is problematic:

  https://github.com/zfsonlinux/zfs/issues/326

With that in mind, the right answer for ZFS is probably for virt-manager/virt-install to NOT ask for any preallocation at all in the ZFS case.

The fun of course is how does virt-manager know whether the image it is about to create is on ZFS or not.  This might point to an RFE for storage pools to report whether a given storage pool instance is capable of efficiently preallocating or not.

Comment 7 Christian Ehrhardt 2019-10-08 13:01:48 UTC
Yep Daniel, just found the same discussion a few minutes ago and updated the Ubuntu bug.

While correct, your suggestion of checking pool features for this seems very complex for the trivial issue that we see atm.

Maybe (only maybe) virt-manager would want to sparse-alloc always and could then change to not request allocation==capacity.
From most common virt-manager use cases I know of sparse would make more sense and be faster (at creation).
I'd expect non-sparse to be a special case that maybe gets a way to be specified if really needed.

And also on the libvirt side we still might consider reverting the commit that broke behavior which would render all the effort to flag those features at pools not needed (at least not for this).

Comment 8 Daniel Berrangé 2019-10-08 13:42:48 UTC
Creating non-sparse images is a good thing as a default, because behaviour when sparse images fail on disk full is not transparent to users.

Comment 9 Christian Ehrhardt 2019-10-09 10:09:48 UTC
Ok, but maybe virt-manager should then get a switch to only create it as sparse if the user wants it to be fast.
Many people use virt-manager to quickly test things, so I'd assume often disks are created, only partially used for a very short time and then thrown away.

But I agree in general and the additional fact that the "pain" only comes in if this is also on ZFS makes it maybe too special to be of high importance.

Comment 10 Christian Ehrhardt 2019-10-09 10:20:01 UTC
I think the discussion above summarizes this for virt-manager well.
I'll lower the prio of the remaining task for virt-manager which is "add a UI switch to request creating images as sparse file"


@Daniel - for libvirt do you still consider reverting the changes around c9ec7088 as it was a semantic change?
And if so would you want an extra bug or ML post for it?
Or has your opinion changed due to the discussion we had here?

Comment 11 Daniel Berrangé 2019-10-09 10:21:24 UTC
I'm honestly not sure about libvirt side. I think it is worth a mail on the list to discuss it more broadly.

Comment 12 Christian Ehrhardt 2019-10-09 11:41:03 UTC
FYI - Discussion for libvirt started in https://www.redhat.com/archives/libvir-list/2019-October/msg00467.html

Comment 13 Persona non grata 2019-12-10 17:53:06 UTC
You may just help the user making an informed decision with all the pros and cons explained in the user interface. Such behavior, common to other virtualization solutions, was beautifully fulfilled before being removed with this commit: https://github.com/virt-manager/virt-manager/commit/15a6a7e2105440df528f75c4df4d2471df28bd1e#diff-03b5bf10a38d88f4a09a14e211c23e05. Every time I create a new virtual machine I need to use the command line to generate a dynamically allocated storage, which is not good.

Comment 14 Cole Robinson 2020-01-26 21:50:08 UTC
*** Bug 1789853 has been marked as a duplicate of this bug. ***

Comment 15 Cole Robinson 2020-08-31 18:55:11 UTC
I think this was fixed in libvirt upstream, please reopen and correct me if I'm wrong

commit 81a3042a12c7c06adc8e95264b6143b2eeb4953f
Author: Pavel Hrdina <phrdina@redhat.com>
Date:   Tue Aug 25 15:09:53 2020 +0200

    storage_util: fix qemu-img sparse allocation
    
    Commit <c9ec7088c7a3f4cd26bb471f1f243931fff6f4f9> introduced a support
    to fully allocate qcow2 images when <allocation> matches <capacity> but
    it doesn't work as expected.
    
    The issue is that info.size_arg is in KB but the info.allocation
    introduced by the mentioned commit is in B. This results in using
    "preallocation=falloc," in cases where "preallocation=metadata," should
    be used.
    
    Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
    Reviewed-by: Michal Privoznik <mprivozn@redhat.com>

Comment 16 Christian Ehrhardt 2020-09-02 12:55:57 UTC
Hi Cole,
thanks for the hint on that fix.
I have applied and tried it but it still uses falloc for the allocation calls from virt-manager.

I have traced a -O0 build in libvirt and see this:

Thread 6 "rpc-worker" hit Breakpoint 4, storageBackendCreateQemuImgOpts (encinfo=0x0, opts=0x7fd1791102a0, info=0x7fd179110380) at ../../../src/storage/storage_util.c:712

(gdb) p *info
$3 = {format = 14, type = 0x7fd17ef920f0 "qcow2", inputType = 0x0, path = 0x7fd164018440 "/var/lib/libvirt/images/ubuntu20.04-l.qcow2", size_arg = 2097152, allocation = 2097152, 
  encryption = false, preallocate = true, compat = 0x7fd16400deb0 "1.1", features = 0x7fd16400e170, nocow = false, backingPath = 0x0, backingFormat = 0, inputPath = 0x0, 
  inputFormatStr = 0x0, inputFormat = 0, secretAlias = 0x0}

And in particular:
(gdb) p info->preallocate
$4 = true
(gdb) p info->size_arg
$5 = 2097152
(gdb) p info->allocation
$6 = 2097152

Due to that the following
712	    if (info->preallocate) {
713	        if (info->size_arg > info->allocation)
714	            virBufferAddLit(&buf, "preallocation=metadata,");
715	        else
716	            virBufferAddLit(&buf, "preallocation=falloc,");

Still decides for falloc as the values are now (due to the fix) equal but not greater than.

Per the description of the initial patch: "build them with preallocation=falloc whenever the XML described storage <allocation> matches its <capacity>" I wonder if we should change the libvirt code to something like:

 if (info.preallocate) {
     if (info.size_arg == info.allocation)
         virBufferAddLit(&buf, "preallocation=falloc,");
     else
         virBufferAddLit(&buf, "preallocation=metadata,");
 }

That code would then work and read exactly as the description did.

P.S. reopening the case

Comment 17 Christian Ehrhardt 2020-09-03 09:37:07 UTC
Hi,
as outlined above the behavior is still wrong.
The patch suggested in [1] will fix the behavior of libvirt to what it was meant to be when preallocation=falloc was added.

But now virt-manager comes into play, the XML sent by virt-manager is exactly matching the definition of the use case to use falloc.
So virt-manager triggered storage request will continue to use falloc despite libvirt being "fixed to how it was intended".

The question now is should we consider changing the XML virt-manager submits so that we are back at the much faster preallocation=metadata mode?

[1]: https://www.redhat.com/archives/libvir-list/2020-September/msg00105.html

Comment 18 Christian Ehrhardt 2020-09-03 14:18:00 UTC
If you have followed the mailing list discussion, this might be:
- stay as-is in virt-manager
- improve libvirt to explicitly select the prealloc mode (optional)
- zfs got a fix to no more surface such an unexpected waiting time (but in to be released 2.0)

The discussion isn't over but I wanted to summarize here as well for everyone just reading this bug now or later.

Comment 19 Cole Robinson 2020-09-20 18:06:24 UTC
Okay now that I actually look into this and wrap my head around it, virt-manager should change.
See this previously mentioned change: https://github.com/virt-manager/virt-manager/commit/15a6a7e2105440df528f75c4df4d2471df28bd1e

The idea behind virt-manager's default was if user selected 'raw' for disk images, assume the user wants
to maximize performance, so fully allocate the disk.

qcow2 didn't support anything except sparse, so the sparse=True vs sparse=False made no difference.
So we always set sparse=False

Then qcow2 grows non-sparse support, and virt-manager is suddenly defaulting to it. That's a pretty big semantic
change. Even if falloc is fast enough in most cases, it is still claiming all the storage up front. Which
is not the historically intended behavior.

So I pushed a change to virt-manager to default to only use non-sparse for raw format by default.
That should side step this issue

I'll be pushing a new release with the fix sometime this coming week

commit ba08f84b3408744e9aa9763d100e8aa217c1f5ff
Author: Cole Robinson <crobinso@redhat.com>
Date:   Sat Sep 19 18:06:45 2020 -0400

    addstorage: Return to using qcow2 sparse by default


Note You need to log in before you can comment on or make changes to this bug.