Bug 1652838 - [ovirt] [RFE] Add ability to set Qcow2 l2-cache-size and Qcow2 cluster size
Summary: [ovirt] [RFE] Add ability to set Qcow2 l2-cache-size and Qcow2 cluster size
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: RFEs
Version: 4.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-23 08:56 UTC by Kai Zimmer
Modified: 2022-01-18 11:33 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2021-09-29 11:33:06 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
Qcow2 advanced image creation form (example) (46.61 KB, image/jpeg)
2018-11-23 09:59 UTC, Kai Zimmer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1377735 0 unspecified CLOSED [Feature Request] Add ability to set qcow2 l2-cache-size 2021-05-25 13:10:38 UTC

Description Kai Zimmer 2018-11-23 08:56:20 UTC
*QEMU's default qcow2 L2 cache size* is too small for large images (and small cluster sizes), resulting in very bad performance.

https://blogs.igalia.com/berto/2015/12/17/improving-disk-io-performance-in-qemu-2-5-with-the-qcow2-l2-cache/
shows huge performance hit for a 20GB qcow2 with default 64kB cluster size:

L2 Cache, MiB   Average IOPS
1 (default)             5100
1.5                     7300
2                      12700
2.5                    63600

The above link also gives the formula:
optimal L2 cache size = L2 table size = (8 Byte) * (disk size) / (cluster size)

and the QEMU command line for setting L2 cache size, which must be specified at each invocation, for example:
qemu-system-x86_64 -drive file=hd.qcow2,l2-cache-size=2621440

Concerning *Qcow2 cluster size* there is an article on https://www.jamescoyle.net/how-to/2055-qcow2-image-format-and-cluster_size :

" A virtual disk, much like how operating systems treat physical disks, are split up into clusters; each cluster being a predefined size and holding a single unit of data. A cluster is the smallest amount of data that can be read or written to in a single operation. There is then an index lookup that’s often kept in memory that knows what information is stored in each cluster and where that cluster is located.

A qcow2 filesystem is copy-on-write (q’cow’2) which means that if a block of data needs to be altered then the whole block is re-written, rather than just updating the changed data. This means that if you have a block size of 1024 (bytes) and you change 1 byte then 1023 bytes have to be read from the original block and then 1024 bytes have to be written – that’s an overhead of 1023 bytes being read and written above the 1 byte change you created. That over head isn’t too bad in the grand scheme, but imagine if you had a block size of 1 MB and still only changed a single byte!
On the other hand, with much large writes another problem can be noticed. If you are constantly writing 1MB files and have a block size of 1024 bytes then you’ll have to split that 1MB file into 1024 parts and store each part in a cluster. Each time a cluster is written to, the metadata must be updated to reflect the new stored data. Again then, there is a performance penalty in storing data this way. A more efficient way of writing 1MB files would be to have a cluster size of 1MB so that each file will occupy a single block with only one metadata reference."

This is known as "Write amplification" from guest filesystem on Qcow2.

To minimize write amplification sector size of the guest need to be "tuned" for the usecase.

It would be great if Ovirt would allow specifying Qcow2 l2-cache-size AND Qcow2 cluster size.

Comment 1 Kai Zimmer 2018-11-23 09:59:01 UTC
Created attachment 1508242 [details]
Qcow2 advanced image creation form (example)

Qcow2 advanced image creation form (example)

Comment 2 Nir Soffer 2020-01-14 22:15:57 UTC
This is not related to ovirt-imageio. Moving to oVirt engine.

Comment 3 Nir Soffer 2020-01-14 22:25:32 UTC
Can qemu tune the cache size automatically based on the cluster size
and the size of the image? I does make sense that all management systems
running qemu and all users will have to calculate the cache size when qemu
has all the info needed to do this.

Regarding cluster size, this looks like something that that the user must
configure, since you need to know the use case to tell what would be a good
cluster size.

So it looks like:
- qemu should handle cache size automatically
- management system can allow the user to set the cluster size

What do you think?

Comment 4 Kevin Wolf 2020-01-15 09:13:42 UTC
This setting is a tradeoff between memory usage and performance, so something that QEMU can't really decide without knowing what is more important to the user. Whether performance can benefit from a larger cache also depends on the workload (outside of benchmarks, nobody overwrites the whole disk with random I/O all the time).

Old QEMU versions (such as 2.5 mentioned in the original report) were defaulting pretty much on the "saving memory" side, using a fixed 1 MB of L2 cache, which is enough to cover random I/O over a range of 8 GB without a cache miss. Since QEMU 3.1 (and backported the RHEL 7's 2.12), up to a certain size it dynamically defaults to a cache size that can keep the full set of L2 tables in memory so that after a warmup period, the disk is never accessed for reading L2 tables. On Linux, this upper limit is 32 MB per image, covering a range of 256 GB (assuming the default 64k cluster size). To reduce the memory overhead when it's not actually needed, cache-clean-interval=600 (10 minutes) is the default on Linux.

For more details, see docs/qcow2-cache.txt in the QEMU source tree.

I believe this is a reasonable default value for the common use cases, but as usual, "one size fits all" doesn't always work, so there may be use cases for which it isn't perfect.

Comment 5 Michal Skrivanek 2020-06-23 12:34:27 UTC
This request is not currently committed to 4.4.z, moving it to 4.5

Comment 7 Eyal Shenitzky 2021-08-23 12:30:43 UTC
This bug/RFE is more than 2 years old and it didn't get enough attention so far, and is now flagged as pending close. 
Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.

Comment 8 Michal Skrivanek 2021-09-29 11:33:06 UTC
This bug didn't get any attention in a long time, and it's not planned in foreseeable future. oVirt development team has no plans to work on it.
Please feel free to reopen if you have a plan how to contribute this feature/bug fix.


Note You need to log in before you can comment on or make changes to this bug.