Description of problem: Support measuring sub-chain of 2 volumes, used in merge flows. Volume.measure supports measuring entire chain: [parent <- base <- top] This means what will be the size of a new image when we convert the chain to single image: qemu-img convert -f qcow2 -O qcow2 top new Or single image: parent <- [base] <- top This means how much space do we need to allocate to copy base to another block based storage: qemu-img convert -f qcow2 -O qcow2 base /dev/other/base -b /dev/other/parent In merge flow, we want to measure a sub chain: parent <- [base <- top] Meaning, what will be the size of base after we commit top into it. We have 2 important use cases: - live merge - currently we extend base image by size of top - cold merge - same, but we reduce the volume size after the merge Being able to measure sub chain, we can avoid over allocation which in the bad cases extend base to the virtual size when it contains no data. This becomes important for new hybrid backup when we do active layer merge on every backup. ## Possible implementation Create empty chain: $ qemu-img create -f qcow2 parent.qcow2 1g Formatting 'parent.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1073741824 lazy_refcounts=off refcount_bits=16 $ qemu-img create -f qcow2 -b parent.qcow2 -F qcow2 base.qcow2 Formatting 'base.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1073741824 backing_file=parent.qcow2 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16 $ qemu-img create -f qcow2 -b base.qcow2 -F qcow2 top.qcow2 Formatting 'top.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1073741824 backing_file=base.qcow2 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16 Fill with data: $ qemu-io -f qcow2 -c 'write -P 1 0 1m' parent.qcow2 wrote 1048576/1048576 bytes at offset 0 1 MiB, 1 ops; 00.01 sec (85.980 MiB/sec and 85.9805 ops/sec) $ qemu-io -f qcow2 -c 'write -P 2 2m 1m' base.qcow2 wrote 1048576/1048576 bytes at offset 2097152 1 MiB, 1 ops; 00.02 sec (53.466 MiB/sec and 53.4659 ops/sec) $ qemu-io -f qcow2 -c 'write -P 3 4m 1m' top.qcow2 wrote 1048576/1048576 bytes at offset 4194304 1 MiB, 1 ops; 00.02 sec (54.524 MiB/sec and 54.5244 ops/sec) Measure top: $ qemu-img measure -O qcow2 top.qcow2 required size: 3538944 fully allocated size: 1074135040 bitmaps size: 0 Measure base: $ qemu-img measure -O qcow2 base.qcow2 required size: 2490368 fully allocated size: 1074135040 bitmaps size: 0 Measure parent: $ qemu-img measure -O qcow2 parent.qcow2 required size: 1441792 fully allocated size: 1074135040 bitmaps size: 0 Measure sub-chain: base <- top $ qemu-img measure -O qcow2 'json:{"driver": "qcow2", "file": {"driver": "file", "filename": "top.qcow2"}, "backing": {"file": {"driver": "file", "filename": "base.qcow2"}, "backing": null}}' required size: 2490368 fully allocated size: 1074135040 bitmaps size: 0 Testing merge: $ qemu-img commit -f qcow2 -b base.qcow2 top.qcow2 Image committed. $ ls -lhs total 4.9M 2.4M -rw-r--r--. 1 nsoffer nsoffer 2.4M Mar 16 22:04 base.qcow2 1.3M -rw-r--r--. 1 nsoffer nsoffer 1.4M Mar 16 22:04 parent.qcow2 1.3M -rw-r--r--. 1 nsoffer nsoffer 1.4M Mar 16 22:04 top.qcow2
Kevin, do you see any issue with the suggested solution, or a simpler way to do this?
Yes, this looks like the correct approach to me. The only way I see to simplify the 'qemu-img measure' command line is if we introduced something like a --base option which would work based on a filename, though identifying base images by their filename is something that we have been hesitant to do in newer interfaces. But on the command line it might be justified. On the other hand, changing QEMU means that you'd have to wait for a release containing the new feature, so your approach is probably more practical. Of course, in the live merge of the active layer case, the required size can change while the commit job is running. If you relied on the overestimation to make this go unnoticed before, you need to make sure now to monitor and dynamically extend not only the active layer, but also the commit target when it fills up to near its current size. (Probably you already do this, mentioning it just in case.)
(In reply to Kevin Wolf from comment #2) > The only way I see to simplify the 'qemu-img measure' command line is if we > introduced something like a --base option which would work based on a > filename, though identifying base images by their filename is something that > we have been hesitant to do in newer interfaces. This would be useful for other users of qemu-img, but I'm not sure how many users need this functionality. I think the current situation is very good; qemu-img make it easy to do the common operations, and if you have special needs the json file name gives you lot of power to do what you need in simple and clear way. > But on the command line it > might be justified. On the other hand, changing QEMU means that you'd have > to wait for a release containing the new feature, so your approach is > probably more practical. Right, we want to use this with RHV 4.4 on RHEL 8.6 now. If qemu-img in RHEL 9 will support --base, oVirt will likey use it. > Of course, in the live merge of the active layer case, the required size can > change while the commit job is running. If you relied on the overestimation > to make this go unnoticed before, you need to make sure now to monitor and > dynamically extend not only the active layer, but also the commit target > when it fills up to near its current size. (Probably you already do this, > mentioning it just in case.) Unfortunately we don't monitor or extend the base volume yet, so live merge may fail with ENOSPC. I hope this will be improved in RHV 4.4.z. Our current live merge uses a very dumb allocation - allocating extra ~5g per merge. If you have 50g mostly empty image, after 10 live merges the image will fully allocated even if there is no actual data in the image. This space will eventually cause oVirt storage domain to become full when there is lot of available space in the underlying stoarge. With this mechanism, we can ensure that on every merge, we have certain amount of free space in the image. If the image was extended in a previous live merge and there is enough free space, we will not extend it again.
The fix for this bug added internal API alowing measuring sub chain, and an option in vdsm API to use it. Future engine can use the new API to measure sub chain when validation storage opeartions. This change enables correct extend in cold and live merge, tracked upstream in: - https://github.com/oVirt/vdsm/issues/134 (fixed in ovirt 4.5.1) - https://github.com/oVirt/vdsm/issues/188 (expected in ovirt 4.5.1)
This bugzilla is included in oVirt 4.5.1 release, published on June 22nd 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.