678050 – 'qemu-img create -f qcow2 -o preallocation=metadata ...' allocates the entire data on iSCSI

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 678050 - 'qemu-img create -f qcow2 -o preallocation=metadata ...' allocates the entire data on iSCSI

Summary: 'qemu-img create -f qcow2 -o preallocation=metadata ...' allocates the entire...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	6.2
Assignee:	Kevin Wolf
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	580954 672346 773650 773651 773665 773677 773696
TreeView+	depends on / blocked

Reported:	2011-02-16 16:01 UTC by Erez Shinan
Modified:	2013-01-22 14:10 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-09-10 10:07:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Erez Shinan 2011-02-16 16:01:44 UTC

Description of problem:

When creating a qcow2 volume with preallocation, it's expected to use only as much disk space as it needs. It does so on NFS, but on block devices (such as iSCSI) it uses the entire size of the file, because it writes to the last byte.

Version-Release number of selected component (if applicable):

0.13.0

How reproducible:

Solid.

Steps to Reproduce:

lvcreate --autobackup n --contiguous n --size 1024m --name
bd43e02a-6424-4f1e-9caa-30f796304e37 d576a2c8-8b5e-43e5-93c9-f710931874e9
qemu-img create -f qcow2 -o preallocation=metadata
/dev/d576a2c8-8b5e-43e5-93c9-f710931874e9/bd43e02a-6424-4f1e-9caa-30f79630
4e37 20971520K

(on a volume too small to hold that much data)

Actual results:


Formatting
'/dev/d576a2c8-8b5e-43e5-93c9-f710931874e9/bd43e02a-6424-4f1e-9caa-30f7963
04e37', fmt=qcow2 size=21474836480 encryption=off cluster_size=0
preallocation='metadata' 
/dev/d576a2c8-8b5e-43e5-93c9-f710931874e9/bd43e02a-6424-4f1e-9caa-30f79630
4e37: error while creating qcow2: No space left on device

Expected results:

Formatting
'/dev/d576a2c8-8b5e-43e5-93c9-f710931874e9/bd43e02a-6424-4f1e-9caa-30f7963
04e37', fmt=qcow2 size=21474836480 encryption=off cluster_size=0

Additional info:

<kwolf> Hm, yes, this is known
<kwolf> I wonder though if it's necessary on block devices
<erez> if it's necessary to preallocate metadata?
<kwolf> On file systems we must increase the file size (but can leave it sparse), but with block devices things could look different
<kwolf> During preallocation, qcow2 does a write to the very last cluster allocated
<erez> without this feature we don't have thin provisioning..
<erez> I see
<erez> Why does it do it? To validate that it exists?
<kwolf> Because otherwise reads would access space after the EOF
<kwolf> Which fails
<kwolf> On block devices, this doesn't matter, obviously
<kwolf> Or does it?
<kwolf> Probably it does. Hm.
<erez> (danken) that's unfortunate... we won't be able to use it on block devices, which is the only place where this matters
<erez> but why does it matter that all addresses exist in the block device?
<erez> if qemu accesses a nonexisting one, it will block on ENOSPC anyway, right?
<kwolf> The problem is not with writes, but with reads
<kwolf> I don't remember the details, though
<kwolf> Maybe we could fix the read function to return zeros (as it's supposed to work)

Comment 2 Dor Laor 2011-02-20 21:58:39 UTC

Can you please test the performance of qcow2 w/ latest 6.1 code?
We have lots of changes there and we might not need prealocation cmdline

Comment 4 Dor Laor 2011-09-04 10:42:29 UTC

No response, closing, if I'm not mistaken it might not be possible on raw devices.

Comment 5 Ayal Baron 2011-09-04 19:31:11 UTC

Reopening.  If you want to test qcow2 performance on 6.1/6.2 you should ask qemu qe or performance team, not rhevm qe / engineering.
Indeed, you cannot preallocate the clusters on iscsi, but you can preallocate the tables so that all md will be located sequentially on disk.

Comment 6 Ayal Baron 2011-09-05 06:44:24 UTC

> Indeed, you cannot preallocate the clusters on iscsi, but you can preallocate
> the tables so that all md will be located sequentially on disk.

Also, you could preallocate according to device size.
Could be nice actually to even do this during runtime (every time we make the device bigger, allocate mappings to added sections...

Btw Dor, even if performance improved in 6.2, it does not mean that preallocating would not improve performance even further which would make this bz worthwhile regardless of current qcow2 performance.
Note that Kevin said in kvm forum that qcow2 causes a 50% performance degradation...

Comment 10 Kevin Wolf 2012-07-24 07:38:35 UTC

What's your use case for this? If I'm not mistaken, formatting the image with a filesystem will already write to sectors close to the end to the image, so if this was implemented, you would only move the LV growth from image creation time to installation time. Which I guess doesn't make a big difference.

Comment 11 Kevin Wolf 2012-09-10 10:07:43 UTC

Had a short email conversation with Ayal about this. For the reasons stated in comment 10 it's obvious that preallocating clusters isn't possible in this case. The open question was whether preallocating the L2 tables would make sense, and if we could take advantage of having all of them sequentially at the start of the image. We came to the conclusion that because we never read two tables at once, having them sequential doesn't help; and in the allocating case the cost of one additional 64k write for the new L2 table for each 512 MB of virtual disk size (assuming 64k clusters) would likely be lost in the noise, so an optimisation for this would be a wasted effort. There's much more to gain in other places.

Therefore, I'm closing the bug again.

Note You need to log in before you can comment on or make changes to this bug.