Bug 678050 - 'qemu-img create -f qcow2 -o preallocation=metadata ...' allocates the entire data on iSCSI
'qemu-img create -f qcow2 -o preallocation=metadata ...' allocates the entire...
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
Unspecified Unspecified
unspecified Severity medium
: rc
: 6.2
Assigned To: Kevin Wolf
Virtualization Bugs
: Reopened
Depends On:
Blocks: 580954 672346 773650 773651 773665 773677 773696
  Show dependency treegraph
Reported: 2011-02-16 11:01 EST by Erez Shinan
Modified: 2013-01-22 09:10 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-09-10 06:07:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Erez Shinan 2011-02-16 11:01:44 EST
Description of problem:

When creating a qcow2 volume with preallocation, it's expected to use only as much disk space as it needs. It does so on NFS, but on block devices (such as iSCSI) it uses the entire size of the file, because it writes to the last byte.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:

lvcreate --autobackup n --contiguous n --size 1024m --name
bd43e02a-6424-4f1e-9caa-30f796304e37 d576a2c8-8b5e-43e5-93c9-f710931874e9
qemu-img create -f qcow2 -o preallocation=metadata
4e37 20971520K

(on a volume too small to hold that much data)

Actual results:

04e37', fmt=qcow2 size=21474836480 encryption=off cluster_size=0
4e37: error while creating qcow2: No space left on device

Expected results:

04e37', fmt=qcow2 size=21474836480 encryption=off cluster_size=0

Additional info:

<kwolf> Hm, yes, this is known
<kwolf> I wonder though if it's necessary on block devices
<erez> if it's necessary to preallocate metadata?
<kwolf> On file systems we must increase the file size (but can leave it sparse), but with block devices things could look different
<kwolf> During preallocation, qcow2 does a write to the very last cluster allocated
<erez> without this feature we don't have thin provisioning..
<erez> I see
<erez> Why does it do it? To validate that it exists?
<kwolf> Because otherwise reads would access space after the EOF
<kwolf> Which fails
<kwolf> On block devices, this doesn't matter, obviously
<kwolf> Or does it?
<kwolf> Probably it does. Hm.
<erez> (danken) that's unfortunate... we won't be able to use it on block devices, which is the only place where this matters
<erez> but why does it matter that all addresses exist in the block device?
<erez> if qemu accesses a nonexisting one, it will block on ENOSPC anyway, right?
<kwolf> The problem is not with writes, but with reads
<kwolf> I don't remember the details, though
<kwolf> Maybe we could fix the read function to return zeros (as it's supposed to work)
Comment 2 Dor Laor 2011-02-20 16:58:39 EST
Can you please test the performance of qcow2 w/ latest 6.1 code?
We have lots of changes there and we might not need prealocation cmdline
Comment 4 Dor Laor 2011-09-04 06:42:29 EDT
No response, closing, if I'm not mistaken it might not be possible on raw devices.
Comment 5 Ayal Baron 2011-09-04 15:31:11 EDT
Reopening.  If you want to test qcow2 performance on 6.1/6.2 you should ask qemu qe or performance team, not rhevm qe / engineering.
Indeed, you cannot preallocate the clusters on iscsi, but you can preallocate the tables so that all md will be located sequentially on disk.
Comment 6 Ayal Baron 2011-09-05 02:44:24 EDT
> Indeed, you cannot preallocate the clusters on iscsi, but you can preallocate
> the tables so that all md will be located sequentially on disk.

Also, you could preallocate according to device size.
Could be nice actually to even do this during runtime (every time we make the device bigger, allocate mappings to added sections...

Btw Dor, even if performance improved in 6.2, it does not mean that preallocating would not improve performance even further which would make this bz worthwhile regardless of current qcow2 performance.
Note that Kevin said in kvm forum that qcow2 causes a 50% performance degradation...
Comment 10 Kevin Wolf 2012-07-24 03:38:35 EDT
What's your use case for this? If I'm not mistaken, formatting the image with a filesystem will already write to sectors close to the end to the image, so if this was implemented, you would only move the LV growth from image creation time to installation time. Which I guess doesn't make a big difference.
Comment 11 Kevin Wolf 2012-09-10 06:07:43 EDT
Had a short email conversation with Ayal about this. For the reasons stated in comment 10 it's obvious that preallocating clusters isn't possible in this case. The open question was whether preallocating the L2 tables would make sense, and if we could take advantage of having all of them sequentially at the start of the image. We came to the conclusion that because we never read two tables at once, having them sequential doesn't help; and in the allocating case the cost of one additional 64k write for the new L2 table for each 512 MB of virtual disk size (assuming 64k clusters) would likely be lost in the noise, so an optimisation for this would be a wasted effort. There's much more to gain in other places.

Therefore, I'm closing the bug again.

Note You need to log in before you can comment on or make changes to this bug.