Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1391529

Summary:	[RFE] When creating qcow2 files, please use '-o preallocation=metadata' so metadata will be allocated
Product:	[oVirt] vdsm	Reporter:	Yaniv Kaul <ykaul>
Component:	RFEs	Assignee:	Daniel Erez <derez>
Status:	CLOSED DEFERRED	QA Contact:	Avihai <aefrat>
Severity:	medium	Docs Contact:
Priority:	low
Version:	---	CC:	bugs, kwolf, nsoffer, tnisan
Target Milestone:	---	Keywords:	FutureFeature, Performance
Target Release:	---	Flags:	ylavi: ovirt-4.3? rule-engine: planning_ack? rule-engine: devel_ack? rule-engine: testing_ack?
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-04-01 14:47:58 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Yaniv Kaul 2016-11-03 13:37:24 UTC

Description of problem:
There is an insignificant (relatively) difference in size if metadata is allocated or not - see for 1TB:
[root@ykaul-mini tmp]# qemu-img create -f qcow2 one_tb.qcow2 1T
Formatting 'one_tb.qcow2', fmt=qcow2 size=1099511627776 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
[root@ykaul-mini tmp]# qemu-img create -f qcow2 -o preallocation=metadata one_tb_w_metadata.qcow2 1T
Formatting 'one_tb_w_metadata.qcow2', fmt=qcow2 size=1099511627776 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[root@ykaul-mini tmp]# du -ch *.qcow2
208K	one_tb.qcow2
161M	one_tb_w_metadata.qcow2

160MB for 1TB disk.

And for 80G:
[root@ykaul-mini tmp]# du -ch 80*.qcow2
196K	80G.qcow2
13M	80G_w_metadata.qcow2
13M	total


I'm not sure how much of a performance boost it should give - but it should. Even if small, I feel it's worth it (together with qcow2v3).

Comment 1 Yaniv Lavi 2016-11-23 08:29:14 UTC

Is this a dup of BZ #1391859?

Comment 2 Yaniv Kaul 2016-11-23 09:44:47 UTC

(In reply to Yaniv Dary from comment #1)
> Is this a dup of BZ #1391859?

It was - now I've split this one for qcow2 and the other (Which is actually probably more important!) to raw with fallocate.

Comment 3 Yaniv Lavi 2017-06-13 09:30:51 UTC

Is it a low hanging item to introduce this support to both?
Should this be also targeted to 4.2?

Comment 4 Yaniv Kaul 2017-06-13 13:23:04 UTC

(In reply to Yaniv Lavi from comment #3)
> Is it a low hanging item to introduce this support to both?
> Should this be also targeted to 4.2?

I wouldn't for the time being. I don't know where we should take into account this metadata, for example. The performance benefit are unclear.

Comment 6 Red Hat Bugzilla Rules Engine 2018-05-27 13:51:50 UTC

This request has been proposed for two releases. This is invalid flag usage. The ovirt-future release flag has been cleared. If you wish to change the release flag, you must clear one release flag and then set the other release flag to ?.

Comment 7 Yaniv Kaul 2018-05-27 14:04:59 UTC

(In reply to Red Hat Bugzilla Rules Engine from comment #6)
> This request has been proposed for two releases. This is invalid flag usage.
> The ovirt-future release flag has been cleared. If you wish to change the
> release flag, you must clear one release flag and then set the other release
> flag to ?.

I don't see the point in pushing it to 4.4.
Either we are convinced it's useful, then let's prioritize it, or close-wontfix it.
Specifically:
0. You can't use it on top of a backing store (qemu limitation: "Backing file and preallocation cannot be used at the same time")
1. In file-based storage, it doesn't matter - we use raw-sparse mostly.
2. In block-based, it's not practical for VM from template (thin-provisioning) - due to 0 above.
3. So I think it's valuable for thin provisioning - when creating empty thin-provisioned disks. It shouldn't be hard to figure out if doable or not - let's try to decide and act upon it.

Comment 8 Tal Nisan 2018-05-28 07:46:41 UTC

I agree with all of the above but Yaniv Lavi asked not to get it in 4.3 for now as we don't know the content yet in terms of capacity, ylavi, as I've said in the bug scrub I'm all in for having this in 4.3

Comment 9 Yaniv Lavi 2018-06-06 15:18:51 UTC

We will discuss it as part of the planning meetings.

Comment 10 Nir Soffer 2018-11-21 09:23:34 UTC

Kevin, can you explain the required size on storage when using 

    qemu-img create -f qcow2 -o preallocation=metadata

I guess we allocate the L1 and L2 table, so we need one L2 table for image up to 
16T, and 2 tables for image up to 32T, or something like this, right?

When we allocated multiple L2 tables upfront, are they allocated at the start of
the image?

The context is creating qcow2 image on a tiny logical volume and extending the 
logical volume as needed. If we can get all metadata of the image in the first
1G when creating an image, this sounds like useful optimization.

Comment 11 Nir Soffer 2018-11-21 09:28:40 UTC

This requires measurements before we change anything. We need to compare
performance with preallocated metadata and without.

Comment 12 Kevin Wolf 2018-11-21 10:31:39 UTC

(In reply to Nir Soffer from comment #10)
> Kevin, can you explain the required size on storage when using 
> 
>     qemu-img create -f qcow2 -o preallocation=metadata
> 
> I guess we allocate the L1 and L2 table, so we need one L2 table for image
> up to 16T, and 2 tables for image up to 32T, or something like this, right?

With 64k clusters, it's one L2 table per 512 MB. You also get one refcount block per 2 GB with 64k clusters and 16 bit refcounts (the default). L1 table and refcount table also grow as the number of L2 tables and refcount blocks grows, but of course those stay smaller.

In the end, you would best ask QEMU itself for examples:

$ qemu-img measure -O qcow2 -o preallocation=metadata --size 1T
required size: 168034304
fully allocated size: 1099679662080

> When we allocated multiple L2 tables upfront, are they allocated at the
> start of the image?
> 
> The context is creating qcow2 image on a tiny logical volume and extending
> the logical volume as needed. If we can get all metadata of the image in the
> first 1G when creating an image, this sounds like useful optimization.

No, not everything is located at the start of the image. You need the full file size upfront for any kind of preallocation, even though most of it stays sparse with preallocation=metadata.

Comment 13 Nir Soffer 2018-11-21 11:55:48 UTC

Based on Kevin response we can use metadata preallocation only for file based storage (both sparse and preallocated) or for preallocated block (needed for incremental backup).

We still need test performance before we do this work.

Comment 14 Nir Soffer 2018-11-21 12:03:54 UTC

But based on commnet 7 - preallocation is not supported with backing file, and we
don't create new qcow2 volumes on file storage, we use raw sparse, so this can help
only disks created from SDK, when user selected format=cow sparse=true, or qcow2
preallocated disks created for incremental backup.

Comment 15 Sandro Bonazzola 2019-01-28 09:43:33 UTC

This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 16 Michal Skrivanek 2020-03-18 15:47:03 UTC

This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 17 Michal Skrivanek 2020-03-18 15:51:47 UTC

This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 18 Michal Skrivanek 2020-04-01 14:47:58 UTC

ok, closing. Please reopen if still relevant/you want to work on it.

Comment 19 Michal Skrivanek 2020-04-01 14:51:20 UTC

ok, closing. Please reopen if still relevant/you want to work on it.