Bug 2041352

Summary:	Default QCOW_OVERHEAD_FACTOR may waste too many extra disks for big storage
Product:	[oVirt] vdsm	Reporter:	Mike Cao <bcao>
Component:	Core	Assignee:	Albert Esteve <aesteve>
Status:	CLOSED UPSTREAM	QA Contact:	Lukas Svaty <lsvaty>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.40.0	CC:	ahadas, bcao, bugs, bzlotnik, lsurette, nsoffer, srevivo, vjuranek, ycui
Target Milestone:	---	Flags:	pm-rhel: ovirt-4.5?
Target Release:	---
Hardware:	x86_64
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-05-24 16:48:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2064907
Bug Blocks:

Description Mike Cao 2022-01-17 07:22:37 UTC

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. default value of QCOW_OVERHEAD_FACTOR is 1.1

75ee4266 vdsm/storage/blockVolume.py     (Fred Rolland        2015-09-17 17:18:16 +0300  44) QCOW_OVERHEAD_FACTOR = 1.1

3cbc0c08 lib/vdsm/storage/blockVolume.py (Vojtech Juranek     2019-07-11 22:32:11 +0200 347)                 alloc_size = int(initial_size * c)



the default QCOW_OVERHEAD_FACTOR is a little waste of storage if they create a 1 TB disk (1.1 TB is assigned)

Comment 5 Nir Soffer 2022-03-16 19:16:01 UTC

1.1 was an estimate when we did not have a way to compute the actual value.

Both vdsm and engine use this estimate in various calculations. In some flows
we use qemu-img measure to allocate the right amount of space, but in some
flows we still use the old estimate.

Steps to fix this issue:

1. Engine assumes that vdsm allocates 10% extra

In some flows engine assumes that vdsm will allocate 10% more, and reduce
the size by 10%. If we fix vdsm, we may break engine code assuming 10%
extra allocation.

    public static long computeCowImageNeededSize(VolumeFormat sourceFormat, long actualSize) {
        // When vdsm creates a COW volume with provided initial size the size is multiplied by 1.1 to prevent a
        // case in which we won't have enough space. If the source is already COW we don't need the additional
        // space.
        return sourceFormat == VolumeFormat.COW
                ? Double.valueOf(Math.ceil(actualSize / StorageConstants.QCOW_OVERHEAD_FACTOR)).longValue()
                : actualSize;
    }

The only used of this is CloneImageGroupVolumesStructureCommand, when it does
not measure images.

If we fix this command to always measure images when the destination is
qocw2 format on block storage, we can remove the bad code assuming vdsm
over allocation.


2. Consider require size for bitmaps

Bitmap size depends on the granularity of the bitmap, which is 1 bit per
64k by the default. For the largest supported image (8t) every bitmap
can take 16 MiB, so 64 bitmaps need 1g. For the smallest image (1g) bitmap
size is 2k, 64 bitmaps need 128k.


So I think we can use:

    virtual_size * 1.01 + 64 * (virtual size / 64k / 8)

Example: for 1 TiB image we can allow:

>>> (1024**4 * 1.01 + 64 * (1024**4 // (64 * 1024) // 8)) / 1024**4
1.0101220703125

Comment 6 Nir Soffer 2022-03-16 19:17:54 UTC

Benny, what do you think about the proposed fix?

Comment 7 Nir Soffer 2022-03-16 20:06:59 UTC

I mentioned one of 5 users of QCOW_OVERHEAD_FACTOR - I think this is the worst one.

We have to check also other users:

    /**
     * direction   storage    transfer    disk     ticket size
     * =================================================================================
     * upload      block      raw         *        virtual size
     * upload      block      -           raw      virtual size (lv size)
     * upload      block      -           qcow2    actual size (lv size)
     * ---------------------------------------------------------------------------------
     * upload      file       raw         *        virtual size
     * upload      file       -           raw      virtual size
     * upload      file       -           qcow2    virtual size + cow overhead
     * ---------------------------------------------------------------------------------
     * upload (ui) *          -           *        uploaded file size
     * ---------------------------------------------------------------------------------
     * download    block      raw         *        virtual size (using /map)
     * download    block      -           raw      virtual size
     * download    block      -           qcow2    image-end-offset (returned by
     *                                             "qemu-img check" - not implemented yet)
     * ---------------------------------------------------------------------------------
     * download    file       raw         *        virtual size (using /map)
     * download    file       -           raw      virtual size
     * download    file       -           qcow2    file size
     * ---------------------------------------------------------------------------------
     */
    private long getTransferSize() {
        DiskImage image = getDiskImage();

        if (getTransferBackend() == ImageTransferBackend.NBD) {
            // NBD always uses virtual size (raw format)
            return image.getSize();
        }

        if (getParameters().getTransferType() == TransferType.Download) {
            if (image.getVolumeFormat() == VolumeFormat.RAW) {
                return image.getSize();
            }
            if (image.getVolumeFormat() == VolumeFormat.COW) {
                return getImageApparentSize(image);
            }
            // Shouldn't happen
            throw new RuntimeException(String.format(
                    "Invalid volume format: %s", image.getVolumeFormat()));
        } else if (getParameters().getTransferType() == TransferType.Upload) {
            if (getParameters().getTransferClientType().isBrowserTransfer()) {
                return getParameters().getTransferSize();
            }
            if (image.getVolumeFormat() == VolumeFormat.RAW) {
                return image.getSize();
            }
            // COW volume format
            boolean isBlockDomain = image.getStorageTypes().get(0).isBlockDomain();
            if (isBlockDomain) {
                return image.getActualSizeInBytes();
            }
            // Needed to allow uploading fully allocated qcow (BZ#1697294)
            // Also, adding qcow header overhead to support small files.
            return (long) Math.ceil(image.getSize() * StorageConstants.QCOW_OVERHEAD_FACTOR)
                    + StorageConstants.QCOW_HEADER_OVERHEAD;
        }


This should be easy to adapt to the new factor, regardless of new/old vdsm.
The size is relevant only when using leagcy image transfer not using the nbd
backend.

In this case we can get the size of the image from storage before the transfer
instead of guessing.

    /**
     * Calculates the required space in the storage domain for creating cloned DiskImages with collapse.
     * When creating COW volume the actual used space will be the needed space * QCOW_OVERHEAD_FACTOR as implemented
     * currently in the VDSM code.
     *
     * */
    private double getTotalSizeForClonedDisk(DiskImage diskImage) {
        double sizeForDisk = ImagesHandler.getTotalActualSizeOfDisk(diskImage, storageDomain.getStorageStaticData());

        if (diskImage.getVolumeFormat() == VolumeFormat.COW) {
            sizeForDisk = Math.ceil(StorageConstants.QCOW_OVERHEAD_FACTOR * sizeForDisk);
        }
        return sizeForDisk;
    }


I think this should be removed and use Volume.measure instead.

    /**
     * Verify there's enough space in the storage domain for creating cloned DiskImages with snapshots without collapse.
     * Space should be allocated according to the volumes type and format, and allocation policy,
     * according to the following table:
     *
     *      | File Domain                             | Block Domain
     * -----|-----------------------------------------|----------------
     * qcow | 1.1 * used space                        |1.1 * used space
     * -----|-----------------------------------------|----------------
     * raw  | preallocated: disk capacity             |disk capacity
     *      | sparse: used space                      |
     *
     * */
    private double getTotalSizeForDisksWithSnapshots(Collection<DiskImage> diskImages) {
        return getTotalSizeForDisksByMethod(diskImages, diskImage -> {
            double sizeForDisk = diskImage.getSize();
            if (storageDomain.getStorageType().isFileDomain() && diskImage.getVolumeType() == VolumeType.Sparse
                || diskImage.getVolumeFormat() == VolumeFormat.COW) {
                sizeForDisk = diskImage.getActualDiskWithSnapshotsSizeInBytes();
            }

            if (diskImage.getVolumeFormat() == VolumeFormat.COW) {
                sizeForDisk = Math.ceil(StorageConstants.QCOW_OVERHEAD_FACTOR * sizeForDisk);
            }
            return sizeForDisk;
        });
    }


This can use Volume.measure.

    private double getRequiredSizeForMerge(SubchainInfo subchain, ActionType snapshotActionType) {
        DiskImage baseSnapshot = subchain.getBaseImage();
        DiskImage topSnapshot = subchain.getTopImage();

        // The snapshot is the root snapshot
        if (Guid.isNullOrEmpty(baseSnapshot.getParentId())) {
            if (baseSnapshot.getVolumeFormat() == VolumeFormat.RAW) {
                // Raw/Block can only be preallocated thus we are necessarily overlapping
                // with existing data
                if (baseSnapshot.getVolumeType() == VolumeType.Preallocated) {
                    return 0.0;
                }

                return Math.min(topSnapshot.getActualSizeInBytes() / StorageConstants.QCOW_OVERHEAD_FACTOR,
                        baseSnapshot.getSize() - baseSnapshot.getActualSizeInBytes());
            }
        }

        // The required size for the extension of the volume we merge into.
        // If the actual size of top is larger than the actual size of base we
        // will be overlapping, hence we extend by the lower of the two.
        return Math.min(topSnapshot.getSize() * StorageConstants.QCOW_OVERHEAD_FACTOR - baseSnapshot.getActualSizeInBytes(),
                topSnapshot.getActualSizeInBytes());

    }

This assumes vdsm over allocation - bad!

I'm not sure we can measure the merge size, since measure works on either the entire
chain, or single image. When merging we need to measure only top and base, and we
must measure them together.

We can measure if the base does not have a parent:

    base <- top

We cannot measure the sub chain base<-top with current vdsm code:

    parent <- base <- top

But we can change vdsm to support this.

Comment 8 Nir Soffer 2022-03-16 20:46:08 UTC

To support measuring volumes before merge, we need bug 2064907.

Comment 10 Arik 2022-05-24 16:48:00 UTC

The impact is not that high and it's not a result of a recent change so it is unlikely to be fixed in 4.5.1 but as it makes sense moving to GitHub: https://github.com/oVirt/vdsm/issues/207

Comment 11 Red Hat Bugzilla 2023-09-15 01:51:07 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days