2064907 – [RFE] Support measuring sub chain

Bug 2064907 - [RFE] Support measuring sub chain

Summary: [RFE] Support measuring sub chain

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	vdsm
Classification:	oVirt
Component:	General
Sub Component:
Version:	4.50
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	ovirt-4.5.1
Target Release:	---
Assignee:	Nir Soffer
QA Contact:	Nir Soffer
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2041352
TreeView+	depends on / blocked

Reported:	2022-03-16 20:42 UTC by Nir Soffer
Modified:	2022-06-23 05:57 UTC (History)
CC List:	5 users (show)
Fixed In Version:	vdsm-4.50.1.2
Clone Of:
Environment:
Last Closed:	2022-06-19 16:28:47 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	pm-rhel: ovirt-4.5? pm-rhel: planning_ack? pm-rhel: devel_ack+ pm-rhel: testing_ack?

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	oVirt vdsm pull 101	0	None	open	Support measuring subchain and active volumes	2022-03-18 17:32:01 UTC
Red Hat Issue Tracker	RHV-45345	0	None	None	None	2022-03-16 20:44:48 UTC

Description Nir Soffer 2022-03-16 20:42:19 UTC

Description of problem:

Support measuring sub-chain of 2 volumes, used in merge flows.

Volume.measure supports measuring entire chain:

    [parent <- base <- top]

This means what will be the size of a new image when we convert the chain to single image:

    qemu-img convert -f qcow2 -O qcow2 top new

Or single image:

    parent <- [base] <- top

This means how much space do we need to allocate to copy base to another block based storage:

    qemu-img convert -f qcow2 -O qcow2 base /dev/other/base -b /dev/other/parent

In merge flow, we want to measure a sub chain:

    parent <- [base <- top]

Meaning, what will be the size of base after we commit top into it.

We have 2 important use cases:

- live merge - currently we extend base image by size of top
- cold merge - same, but we reduce the volume size after the merge

Being able to measure sub chain, we can avoid over allocation which in the bad
cases extend base to the virtual size when it contains no data. This becomes
important for new hybrid backup when we do active layer merge on every backup.

## Possible implementation

Create empty chain:

$ qemu-img create -f qcow2 parent.qcow2 1g
Formatting 'parent.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1073741824 lazy_refcounts=off refcount_bits=16

$ qemu-img create -f qcow2 -b parent.qcow2 -F qcow2 base.qcow2
Formatting 'base.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1073741824 backing_file=parent.qcow2 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16

$ qemu-img create -f qcow2 -b base.qcow2 -F qcow2 top.qcow2
Formatting 'top.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1073741824 backing_file=base.qcow2 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16

Fill with data:

$ qemu-io -f qcow2 -c 'write -P 1 0 1m' parent.qcow2 
wrote 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 00.01 sec (85.980 MiB/sec and 85.9805 ops/sec)

$ qemu-io -f qcow2 -c 'write -P 2 2m 1m' base.qcow2 
wrote 1048576/1048576 bytes at offset 2097152
1 MiB, 1 ops; 00.02 sec (53.466 MiB/sec and 53.4659 ops/sec)

$ qemu-io -f qcow2 -c 'write -P 3 4m 1m' top.qcow2 
wrote 1048576/1048576 bytes at offset 4194304
1 MiB, 1 ops; 00.02 sec (54.524 MiB/sec and 54.5244 ops/sec)

Measure top:

$ qemu-img measure -O qcow2 top.qcow2 
required size: 3538944
fully allocated size: 1074135040
bitmaps size: 0

Measure base:

$ qemu-img measure -O qcow2 base.qcow2 
required size: 2490368
fully allocated size: 1074135040
bitmaps size: 0

Measure parent:

$ qemu-img measure -O qcow2 parent.qcow2 
required size: 1441792
fully allocated size: 1074135040
bitmaps size: 0

Measure sub-chain: base <- top

$ qemu-img measure -O qcow2 'json:{"driver": "qcow2",
                                   "file": {"driver": "file",
                                            "filename": "top.qcow2"},
                                   "backing": {"file": {"driver": "file",
                                                        "filename": "base.qcow2"},
                                               "backing": null}}'
required size: 2490368
fully allocated size: 1074135040
bitmaps size: 0

Testing merge:

$ qemu-img commit -f qcow2 -b base.qcow2 top.qcow2
Image committed.

$ ls -lhs
total 4.9M
2.4M -rw-r--r--. 1 nsoffer nsoffer 2.4M Mar 16 22:04 base.qcow2
1.3M -rw-r--r--. 1 nsoffer nsoffer 1.4M Mar 16 22:04 parent.qcow2
1.3M -rw-r--r--. 1 nsoffer nsoffer 1.4M Mar 16 22:04 top.qcow2

Comment 1 Nir Soffer 2022-03-16 20:43:51 UTC

Kevin, do you see any issue with the suggested solution, or a simpler way
to do this?

Comment 2 Kevin Wolf 2022-03-17 08:22:39 UTC

Yes, this looks like the correct approach to me.

The only way I see to simplify the 'qemu-img measure' command line is if we introduced something like a --base option which would work based on a filename, though identifying base images by their filename is something that we have been hesitant to do in newer interfaces. But on the command line it might be justified. On the other hand, changing QEMU means that you'd have to wait for a release containing the new feature, so your approach is probably more practical.

Of course, in the live merge of the active layer case, the required size can change while the commit job is running. If you relied on the overestimation to make this go unnoticed before, you need to make sure now to monitor and dynamically extend not only the active layer, but also the commit target when it fills up to near its current size. (Probably you already do this, mentioning it just in case.)

Comment 3 Nir Soffer 2022-03-17 12:11:34 UTC

(In reply to Kevin Wolf from comment #2)
> The only way I see to simplify the 'qemu-img measure' command line is if we
> introduced something like a --base option which would work based on a
> filename, though identifying base images by their filename is something that
> we have been hesitant to do in newer interfaces.

This would be useful for other users of qemu-img, but I'm not sure how many
users need this functionality.

I think the current situation is very good; qemu-img make it easy to do the
common operations, and if you have special needs the json file name gives you
lot of power to do what you need in simple and clear way.

> But on the command line it
> might be justified. On the other hand, changing QEMU means that you'd have
> to wait for a release containing the new feature, so your approach is
> probably more practical.

Right, we want to use this with RHV 4.4 on RHEL 8.6 now. If qemu-img in RHEL 9
will support --base, oVirt will likey use it.

> Of course, in the live merge of the active layer case, the required size can
> change while the commit job is running. If you relied on the overestimation
> to make this go unnoticed before, you need to make sure now to monitor and
> dynamically extend not only the active layer, but also the commit target
> when it fills up to near its current size. (Probably you already do this,
> mentioning it just in case.)

Unfortunately we don't monitor or extend the base volume yet, so live merge
may fail with ENOSPC. I hope this will be improved in RHV 4.4.z.

Our current live merge uses a very dumb allocation - allocating extra ~5g per
merge. If you have 50g mostly empty image, after 10 live merges the image will
fully allocated even if there is no actual data in the image. This space will
eventually cause oVirt storage domain to become full when there is lot of
available space in the underlying stoarge.

With this mechanism, we can ensure that on every merge, we have certain amount
of free space in the image. If the image was extended in a previous live merge
and there is enough free space, we will not extend it again.

Comment 7 Nir Soffer 2022-05-25 09:30:52 UTC

The fix for this bug added internal API alowing measuring sub chain, and an option
in vdsm API to use it. Future engine can use the new API to measure sub chain when
validation storage opeartions.

This change enables correct extend in cold and live merge, tracked upstream in:
- https://github.com/oVirt/vdsm/issues/134 (fixed in ovirt 4.5.1)
- https://github.com/oVirt/vdsm/issues/188 (expected in ovirt 4.5.1)

Comment 8 Sandro Bonazzola 2022-06-23 05:57:04 UTC

This bugzilla is included in oVirt 4.5.1 release, published on June 22nd 2022.
Since the problem described in this bug report should be resolved in oVirt 4.5.1 release, it has been closed with a resolution of CURRENT RELEASE.
If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.