Bug 2018971

Summary: [CBT][Veeam] Scratch disks on block-based storage domain created with the wrong initial size.
Product: [oVirt] ovirt-engine Reporter: Eyal Shenitzky <eshenitz>
Component: BLL.StorageAssignee: Arik <ahadas>
Status: CLOSED CURRENTRELEASE QA Contact: Amit Sharir <asharir>
Severity: high Docs Contact:
Priority: high    
Version: 4.4.9CC: aefrat, bugs, dfodor, michal.skrivanek, nsoffer
Target Milestone: ovirt-4.4.10Keywords: ZStream
Target Release: 4.4.10Flags: pm-rhel: ovirt-4.4+
asharir: testing_plan_complete+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.10 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-19 07:00:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eyal Shenitzky 2021-11-01 11:55:01 UTC
Description of problem:

When Starting a live VM backup, scratch disk created for each disk that participates in the backup.

When the backed-up disk resides on a block-based storage domain, the scratch disk
created with the wrong initial size and can cause the VM to pause.

For RAW block-based scratch disk, the initial size is set for - 0.
For COW block-based scratch disk, the initial size is set according to the active volume size.

The values that should be set for the initial size are - 

For a RAW block-based scratch disk, the initial size is set for - the backed-up disk's actual size.
For a COW block-based scratch disk, the initial size should be measured to calculate the volumes chain size.

Also, for RAW block-based disks, the scratch disk size should be the backed-up disk actual size.


Version-Release number of selected component (if applicable):
4.5 - master

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with RAW / COW block-based disk
2. Start the VM
3. Start live VM backup

Actual results:
Scratch disk created with wrong size / initial size

Expected results:
Scratch disk should be created with the size as described above.

Additional info:
This bug will be fixed together with the option to configure the initial size for block-based scratch disks and can be tested by setting the 'BackupBlockScratchDiskInitialSizePercents' configuration value to 100%.

Comment 1 Eyal Shenitzky 2021-11-01 12:54:04 UTC
> This bug will be fixed together with the option to configure the initial
> size for block-based scratch disks and can be tested by setting the
> 'BackupBlockScratchDiskInitialSizePercents' configuration value to 100%.

bug 2018986

Comment 2 Nir Soffer 2021-11-02 00:29:14 UTC
*** Bug 2019265 has been marked as a duplicate of this bug. ***

Comment 3 Nir Soffer 2021-11-02 00:32:04 UTC
How to reproduce and verify:

Steps to Reproduce:

Raw disk:
1. Start vm with 10g raw disk
2. Start backup
3. Verify that scratch disk initial size should be 11g
   (we allocate about 1g extra for qcow2 metadata)

Qcow2 disk based on a template:
1. Start VM with qcow2 disk based on template
2. Start backup
3. Verify that scratch disk initial size is 1g more than the size reported
   by qemu-img measure.

To measure the disk you can use:

    # virsh -r dumpxml vm-name

find the disk path in the xml, measure the disk with qemu-img measure

    # qemu-img measure -O qcow2 /path/to/disk/from/xml
    ...
    required: 2684354560

(this is only an example, your actual disk may be smaller or larger)

Add 1g:

    2684354560 + 1073741824 * 1.1 = 4133906022

Get the backup xml:

    # virsh -r backup-dumpxml vm-name

Find the scratch disk path in the xml

    /rhev/data-center/mnt/blockSD/domain-id/images/disk-id/volume-id

Check the size of the logical volume:

    # lvs domain-id/volume-id

The size should be 3.875g.

Comment 10 Amit Sharir 2021-12-20 07:44:18 UTC
Version:
ovirt-engine-4.4.10-0.17.el8ev.noarch / vdsm-4.40.100.1-1.el8ev.x86_64

For the raw disk flow, everything works as expected.
For the QCOW disk on block storage, we need also to round up the volume size to a multiply of 128 mib.

Verification flow for qcow2:

1. Set MaxBackupBlockScratchDiskInitialSizePercents to 100% and MinBackupBlockScratchDiskInitialSizeInGB to 1. (via vdsm run - <engine-config -s "MaxBackupBlockScratchDiskInitialSizePercents=100">)
2. Start VM with qcow2 disk based on a template.
3. Start backup - via API
4. Use <virsh -r dumpxml vm-name> on vdsm to find the disk path in the XML. 
5. For the disk of the template I got the following size: 

# qemu-img measure -O qcow2 /rhev/data-center/mnt/blockSD/d106a99f-ed75-4a3f-b50c-6bd002bede3a/images/0e291e06-5089-4321-ad1a-e63200488b8f/3277f5e3-c351-44e0-821f-d452b85eab4d
required size: 3385655296
fully allocated size: 10739318784
bitmaps size: 0

5. Then I used virsh in order to reach the relevant "scratch disk" path.
6. Used command: <vdsm-client StorageDomain dump sd_id=d106a99f-ed75-4a3f-b50c-6bd002bede3a | grep -A 16 018a92c1-c9bd-480e-91ac-df04450f7e58> via vdsm to find the size of the logical volume


        "018a92c1-c9bd-480e-91ac-df04450f7e58": {
            "apparentsize": 4966055936,
            "capacity": 10737418240,
            "ctime": 1638782825,
            "description": "{\"DiskAlias\":\"VM test1100 backup 31f88ccd-7670-469e-b2dc-eb111e6c6820 scratch disk for latest-rhel-guest-image-8.5-infra\",\"DiskDescription\":\"Backup 31f88ccd-7670-469e-b2dc-eb111e6c6820 scratch disk\"}",
            "disktype": "SCRD",
            "format": "COW",
            "generation": 0,
            "image": "11f63d5c-f276-4a30-8223-2c15090fa116",
            "legality": "LEGAL",
            "mdslot": 10,
            "parent": "00000000-0000-0000-0000-000000000000",
            "status": "OK",
            "truesize": 4966055936,
            "type": "SPARSE",
            "voltype": "LEAF"
        },


Summary of disk sizes and calculations:

required size: 3385655296
expected size: (3385655296 + 1073741824) * 1.1 -> 4905336832
actual size:   4966055936

If you round up 4905336832 to a mutiply of 128 MiB:

>>> 4905336832 / (128 * 1024**2)
36.547607421875
>>> 37 * (128 * 1024**2)
4966055936 - as expected

Verification Conclusions:
The sizes of the generated scratch disks were correct.

Bug verified.

Comment 11 Sandro Bonazzola 2022-01-19 07:00:13 UTC
This bugzilla is included in oVirt 4.4.10 release, published on January 18th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.4.10 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 12 Amit Sharir 2022-02-02 09:50:20 UTC
Added Polarion test plan: RHEVM 27932