Bug 2097139 - Inconsistent qcow2 overhead calculation causes discrepancies and possibly wastes SD space
Summary: Inconsistent qcow2 overhead calculation causes discrepancies and possibly was...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.5.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-14 23:13 UTC by Germano Veit Michel
Modified: 2022-08-09 16:15 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-04 14:32:04 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OCPRHV-813 0 None None None 2022-06-27 19:21:56 UTC
Red Hat Issue Tracker RHV-46425 0 None None None 2022-06-14 23:20:34 UTC

Description Germano Veit Michel 2022-06-14 23:13:01 UTC
Description of problem:

When OCP runs on RHV, it can use RHV disks as PVCs.

1. OCP asks RHV to create disks in a somewhat *weird* way, which may deserve a bug on its own.

It appears to want QCOW2 "Preallocated", but sends sparse and sets initial size to match capacity, like this:

For example a 100G disk

<disk>
  [...]
  <format>cow</format>
  <initial_size>107374182400</initial_size>
  <provisioned_size>107374182400</provisioned_size>
</disk>

2. This makes the engine send this to VDSM, with 100G both as capacity and initial size.

2022-05-25 09:45:42,026+0200 INFO  (jsonrpc/6) [vdsm.api] START createVolume(
...
  size='107374182400'
  volFormat=4
  preallocate=2
  initialSize='107374182400'
...
)

3. So VDSM ends up where

blockVolume.py
   319	    def calculate_volume_alloc_size(
   ...
   342	        if preallocate == sc.SPARSE_VOL:
   343	            # Sparse qcow2
   344	            if initial_size:
   345	                # TODO: if initial_size == max_size, we exceed the max_size
   346	                # here. This should be fixed, but first we must check that
   347	                # engine is not assuming that vdsm will increase initial size
   348	                # like this.
   349	                alloc_size = int(initial_size * QCOW_OVERHEAD_FACTOR)

4. Given that:

QCOW_OVERHEAD_FACTOR = 1.1

5. It creates a 110G LV to hold a 100G qcow2. Sounds execessive and does not match what a Preallocated COW (i.e. create preallocated with incremental backup enabled - which creates a 100G LV here and may be a bug on the opposite end?).

6. See the discrepancy:

2022-05-25 09:45:42,330+0200 INFO  (tasks/7) [storage.Volume] Request to create COW volume /rhev/data-center/mnt/blockSD/6552eb00-37b6-4985-a588-fc19ef44e3ec/images/acf65db3-51ad-467d-b86d-61d463afe8d8/0d45176f-3cdb-44f3-a1f6-a59ab907a745 with capacity = 107374182400 (blockVolume:517)

7. Half a second later, from the getInfo on the VolumeCreate flow:

2022-05-25 09:45:44,821+0200 INFO  (jsonrpc/7) [storage.VolumeManifest] 6552eb00-37b6-4985-a588-fc19ef44e3ec/acf65db3-51ad-467d-b86d-61d463afe8d8/0d45176f-3cdb-44f3-a1f6-a59ab907a745 info is {
  'type': 'SPARSE', 
  'format': 'COW', 
  'disktype': 'DATA', 
  'voltype': 'LEAF', 
  'capacity': '107374182400', 
  ...
  'apparentsize': '118111600640', 
  'truesize': '118111600640', 
} (volume:278)

I don't really know what to expect here, but this seems off. OCP may need some work on how it calls RHV API, but just RHV on its own also does not look correct and consistent. Wonder if initial size being equal to capacity shouldn't make RHV do preallocated qcow. And there is a TODO in the code :)

Version-Release number of selected component (if applicable):
vdsm-4.50.0.13-1.el8ev.x86_64
rhvm-4.5.0.7-0.9.el8ev.noarch

How reproducible:
100%

Steps to Reproduce:

disk = disks_service.add(
    types.Disk(
        name='mydisk',
        description='My disk',
        format=types.DiskFormat.COW,
        provisioned_size=100 * 2**30,
        initial_size=100 * 2**30,
        storage_domains=[
            types.StorageDomain(
                name='iSCSI'
            )
        ]
    )
)

Actual results:
* Possible waste of storage space
* More allocation than "Preallocated COW" if initial size and size are the same for a "sparse" disk

Expected results:
* Does it need this much overhead on a 100G disk. If its a 1T disk it will allocate an extra 100G and so on.
* Should match preallocated cow allocation?

Comment 1 Janos Bonic 2022-06-15 10:40:34 UTC
@germano regarding the OCP on RHV weird disk creation, could you elaborate on that?

Also, can you let us know what OCP version the customer is running here?

Comment 2 Germano Veit Michel 2022-06-15 21:20:46 UTC
(In reply to Janos Bonic from comment #1)
> @germano regarding the OCP on RHV weird disk creation, could you
> elaborate on that?

Hi Janos,

I think the problem may be on the go ovirt client, its setting both Provisioned and Initial size to the same value (size), which is passed from ovirt-csi driver.

	diskBuilder := ovirtsdk4.NewDiskBuilder().
		ProvisionedSize(int64(size)).
		InitialSize(int64(size)).
		StorageDomainsOfAny(storageDomain).
		Format(ovirtsdk4.DiskFormat(format))

https://github.com/oVirt/go-ovirt-client/blob/main/disk_create.go#L119

> Also, can you let us know what OCP version the customer is running here?

I'll ask, I just saw the weird request arriving on the logs. I guess you are more interested in the ovirt-csi driver version?

Comment 3 Germano Veit Michel 2022-06-15 21:27:36 UTC
btw, maybe we should split that from the VDSM issue to avoid confusion. Let me know what you think and we can do a new bug for that.

Comment 4 Ulhas Surse 2022-06-16 08:50:53 UTC
(In reply to Janos Bonic from comment #1)
> @germano regarding the OCP on RHV weird disk creation, could you
> elaborate on that?
> 
> Also, can you let us know what OCP version the customer is running here?

OCP version is 4.10.14

Comment 5 Arik 2022-06-20 14:14:32 UTC
(In reply to Germano Veit Michel from comment #0)
> 4. Given that:
> 
> QCOW_OVERHEAD_FACTOR = 1.1
> 
> 5. It creates a 110G LV to hold a 100G qcow2. Sounds execessive and does not
> match what a Preallocated COW (i.e. create preallocated with incremental
> backup enabled - which creates a 100G LV here and may be a bug on the
> opposite end?).
> 
> 6. See the discrepancy:
> 
> 2022-05-25 09:45:42,330+0200 INFO  (tasks/7) [storage.Volume] Request to
> create COW volume
> /rhev/data-center/mnt/blockSD/6552eb00-37b6-4985-a588-fc19ef44e3ec/images/
> acf65db3-51ad-467d-b86d-61d463afe8d8/0d45176f-3cdb-44f3-a1f6-a59ab907a745
> with capacity = 107374182400 (blockVolume:517)
> 
> 7. Half a second later, from the getInfo on the VolumeCreate flow:
> 
> 2022-05-25 09:45:44,821+0200 INFO  (jsonrpc/7) [storage.VolumeManifest]
> 6552eb00-37b6-4985-a588-fc19ef44e3ec/acf65db3-51ad-467d-b86d-61d463afe8d8/
> 0d45176f-3cdb-44f3-a1f6-a59ab907a745 info is {
>   'type': 'SPARSE', 
>   'format': 'COW', 
>   'disktype': 'DATA', 
>   'voltype': 'LEAF', 
>   'capacity': '107374182400', 
>   ...
>   'apparentsize': '118111600640', 
>   'truesize': '118111600640', 
> } (volume:278)

Yeah, that is expected. We have a bug asking to improve this (bz 2041352) but it was not prioritized and we moved it upstream (https://github.com/oVirt/vdsm/issues/207) - let's separate that out from this bug. other than that, everything else seems to be related to the input we get from OCP so we think this bug should move to OCP on RHV

Comment 6 Germano Veit Michel 2022-06-20 21:14:09 UTC
Sure, we can use this to track the OCP side then, which actually seem to be a bug on the Go SDk.

Comment 8 Michal Skrivanek 2022-06-27 14:52:35 UTC
(In reply to Germano Veit Michel from comment #2)
> (In reply to Janos Bonic from comment #1)
> > @germano regarding the OCP on RHV weird disk creation, could you
> > elaborate on that?
> 
> Hi Janos,
> 
> I think the problem may be on the go ovirt client, its setting both
> Provisioned and Initial size to the same value (size), which is passed from
> ovirt-csi driver.
> 
> 	diskBuilder := ovirtsdk4.NewDiskBuilder().
> 		ProvisionedSize(int64(size)).
> 		InitialSize(int64(size)).
> 		StorageDomainsOfAny(storageDomain).
> 		Format(ovirtsdk4.DiskFormat(format))
> 
> https://github.com/oVirt/go-ovirt-client/blob/main/disk_create.go#L119
 
and this is with the default settings, I suppose? I.e. Sparse set to true, Format set to COW.

Why do we set initial size to the provisioned size, isn't that a hchange in behavior/regression from previous code? This should have created "normal" thin provisioned volume, not preallocate...

Comment 9 Janos Bonic 2022-06-27 19:21:57 UTC
The OCP on RHV issue is now tracked in Jira under https://issues.redhat.com/browse/OCPRHV-813

Comment 10 Arik 2022-07-04 14:32:04 UTC
closing based on comment 9


Note You need to log in before you can comment on or make changes to this bug.