Bug 977778 (RHEV_thin_to_preallocated_disks)

Summary: [RFE] - Mechanism for converting disks for non-running VMS
Product: Red Hat Enterprise Virtualization Manager Reporter: vinay <vchoudha>
Component: ovirt-engineAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED ERRATA QA Contact: sshmulev
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: ableisch, acanan, ahadas, amarirom, arne.gogala, baptiste.agasse, byount, bzlotnik, ccesario, ebenahar, emarcus, fgarciad, guillaume.pavese, gveitmic, imomin, jbuchta, jortialc, jraju, lagern, lpeer, lsurette, mchappel, mkalinin, nsoffer, obockows, pablo.iranzo, paulds, pelauter, qmin77, rmcswain, rwashbur, scohen, sigbjorn.lie, sigbjorn, spower, sputhenp, srevivo, ssekidde, tnisan, vanhoof, yuriy.khokhlov, Yury.Panchenko
Target Milestone: ovirt-4.5.0Keywords: FutureFeature, ZStream
Target Release: 4.5.0Flags: sherold: Triaged+
ebenahar: testing_plan_complete?
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.5.0 Doc Type: Enhancement
Doc Text:
In this release, support has been added for the conversion of a disk's format and allocation policy. This can help reduce space usage and improve performance, as well as enabling incremental backup on existing raw disks.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-26 16:22:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2063802, 2069670    
Bug Blocks: 1736852, 1015651, 1059271, 1523346    

Comment 7 kyumin 2014-06-12 08:30:55 UTC
any news about this RFE

Comment 8 Yaniv Kaul 2016-01-20 18:20:38 UTC
*** Bug 1059271 has been marked as a duplicate of this bug. ***

Comment 10 Yaniv Kaul 2016-03-02 17:03:13 UTC
Sparsification (coming in 4.0) may help here. Not entirely, since it cannot do it when there are multiple leafs (so it cannot go all down the snap tree).

Comment 19 spower 2018-07-03 10:56:03 UTC
We agreed to remove RFEs component from Bugzilla, if you feel the component has been renamed incorrectly please reach out.

Comment 22 Marina Kalinin 2018-08-16 21:45:27 UTC
Seems like if we get this bz#1616445 fixed, we can have this achieved we are templates.
It is ugly, but it is possible. At least the UI says it can do it.
And it works for block storage domains already.

Comment 23 Marina Kalinin 2018-08-16 21:46:13 UTC
s/we are/via  :)

Comment 24 Sandro Bonazzola 2019-01-28 09:41:49 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 27 Nir Soffer 2019-03-13 13:44:26 UTC
This will be important for incremental backup since it will be supported only
for qcow2 format.

Comment 30 Nir Soffer 2019-07-16 21:06:11 UTC
There are two issues mentions here:
- Hight I/O issues
- Avoiding over allocation

I'm not sure what are the high I/O issues, and how much raw preallocated
images are better compared to qcow2 images. We don't have yet performance
results showing the difference, and they are many qcow2 options that can 
affect performance that we never tried.

For avoiding over allocation, it will be very simple to change existing 
images allocation to avoid over allocation, but if the storage it self is
thin provision, it may be impossible to avoid over allocation.

I think a more interesting use case is to convert raw volumes to qcow2 format
since this is the only format that can provide incremental backup support, at
least in 4.4.

Regarding the solutions suggested in:
https://access.redhat.com/solutions/432773

- Creating a huge file in the VM to force allocation is indeed very inefficient,
  and we can make this much more efficient by extending the logical volume directly.
  There is a vdsm API to do this with one call, it can be exposed via the UI and
  the SDK. This is also useful for other use cases like restoring images when the 
  backup application cannot estimate the size of the restored data before creating
  the destination volume.

- Creating another disk and copying the data is not very different from what we can
  provide to convert disks. There is no way to do in-place conversion from raw to qcow2
  or from qcow2 to raw, so converting always mean creating a new disk and converting
  the data and this will always be time consuming.

I think the best way we can provide this conversion is:
- Add support for format conversion and collapsing in live storage migration
- Add support for migrating to the same storage in live storage migration
- Implement live storage migration on engine, using copy_data storage jobs
  (which will also provide progress info to this flow)

This can be hidden by simple UI or use existing live storage migaration UI.

With this user can convert disks in the background, without any downtime, so the time
consuming copy is not a problem.

Improving live storage migration will also help with other use case like migrating
virtual machines with minimal downtime, and improve the user experience by adding
progress.

Comment 32 Nathan 2019-07-24 11:58:57 UTC
(In reply to Nir Soffer from comment #30)
> There are two issues mentions here:
> - Hight I/O issues
> - Avoiding over allocation
> 
> I'm not sure what are the high I/O issues, and how much raw preallocated
> images are better compared to qcow2 images. We don't have yet performance
> results showing the difference, and they are many qcow2 options that can 
> affect performance that we never tried.
> 
> For avoiding over allocation, it will be very simple to change existing 
> images allocation to avoid over allocation, but if the storage it self is
> thin provision, it may be impossible to avoid over allocation.
> 
> I think a more interesting use case is to convert raw volumes to qcow2 format
> since this is the only format that can provide incremental backup support, at
> least in 4.4.
> 
> Regarding the solutions suggested in:
> https://access.redhat.com/solutions/432773
> 
> - Creating a huge file in the VM to force allocation is indeed very
> inefficient,
>   and we can make this much more efficient by extending the logical volume
> directly.
>   There is a vdsm API to do this with one call, it can be exposed via the UI
> and
>   the SDK. This is also useful for other use cases like restoring images
> when the 
>   backup application cannot estimate the size of the restored data before
> creating
>   the destination volume.
> 
> - Creating another disk and copying the data is not very different from what
> we can
>   provide to convert disks. There is no way to do in-place conversion from
> raw to qcow2
>   or from qcow2 to raw, so converting always mean creating a new disk and
> converting
>   the data and this will always be time consuming.
> 
> I think the best way we can provide this conversion is:
> - Add support for format conversion and collapsing in live storage migration
> - Add support for migrating to the same storage in live storage migration
> - Implement live storage migration on engine, using copy_data storage jobs
>   (which will also provide progress info to this flow)
> 
> This can be hidden by simple UI or use existing live storage migaration UI.
> 
> With this user can convert disks in the background, without any downtime, so
> the time
> consuming copy is not a problem.
> 
> Improving live storage migration will also help with other use case like
> migrating
> virtual machines with minimal downtime, and improve the user experience by
> adding
> progress.

If you're looking for justification on why this feature request exists, consider that it's 5 years old, and at least for me, the motivation was this.  In earlier releases of RHEV 3.x, thin provisioning was flat out dangerous (in my experience).  And yet it was the default when creating disks.  It had a tendency to fill storage domains unexpectedly, and in some cases disk corruption.  So the ability to convert from qcow to raw would have been very useful for those systems that were accidentally allocated as a qcow.  Being a long time RHEV user, I've made it my practice to avoid thin provisioning at every avenue, this includes avoiding the use of snapshots.  This feature request doesn't even apply to me anymore, as I do not have any qcow/thin disks remaining in my clusters.  

All that being said, from what I can tell, thin provisioning has come a long way since 3.0, and seems like it could be stable.  I'm still hesitant to depend on it, but thats because I have scar tissue.

Comment 33 Mark Chappell 2019-07-24 12:45:14 UTC
The remaining issue that Red Hat IT has seen around this one has been some of the APIs forcing thin-provisioning when cloning disks.  In my case it was specifically using Ansible to import images and clone the disks.

While the Web UI gives you the option to pre-allocate the storage when you clone IIRC the APIs do not (I've switched teams so I've not tried for a while).

Comment 36 Paul Stauffer 2019-07-25 14:10:17 UTC
If clarification on use cases is being requested, in our case the issue is that we've had many VMs that were initially created from templates using the "Thin" storage allocation option instead of "Clone".  As a result we eventually ended up with large numbers of VMs whose disks were all using the same single backing file.  This came to represent a significant single-point-of-failure risk, because if anything happened to that one file, large numbers of VMs would be destroyed.  For this reason, we want to have a way of retroactively de-coupling the thinly-provisioned disks from the template's backing file, to turn them into standalone disks.  The motivation for our RFE (which got rolled into this one) wasn't explicitly about pre-allocation of space; it was just about de-coupling a thinly-provisioned templated VM from the template's disk.

Comment 37 Carlos 2020-05-25 20:00:37 UTC
Hi,

Does this feature already supported on RHEV 4.4?

Comment 38 Nir Soffer 2020-05-25 21:06:48 UTC
(In reply to Carlos from comment #37)
> Does this feature already supported on RHEV 4.4?

No, but we have more options to convert disk format when cloning and
exporting disks and vms so it should be easier to change the format
manualy.

We also have more infrastructure that will make it easier to provide
live storage format conversion in 4.5.

Comment 39 Tal Nisan 2020-06-09 15:43:07 UTC
Benny, we should have most of the functionality in the clone command, can you please check if anything else is needed here?

Comment 40 Nir Soffer 2020-08-09 23:41:38 UTC
Raising priority since this is required for incremental backup.

This is important for incremental backup. Users with preallocated disks
that want to use incremental backup will want to convert the disk to
qcow2 format to get incremental backupo capability.

Currently the only way is to create a snapshot and continue to use the 
snapshot forever. Converting the disk to qcow2 will improve performance
and reliability, being able to use preallocated qcow2.

Comment 41 Yuriy Khokhlov (Veeam) 2020-10-26 09:39:01 UTC
I agree with Nir (https://bugzilla.redhat.com/show_bug.cgi?id=977778#c40). The priority of this task should be raised.

We also think that support of the incremental backup for existing VMs is essential.

Comment 42 Nir Soffer 2021-02-01 13:47:32 UTC
Switching from any format to any format can reuse live storage migration
flow like this:

- Create temporary snapshot on source disk
  
Code exists

- Create target disk chain

  Today we always replicate the original disk layout, and always use differnent
  storage domain for the target.
  With new code:

  - Create one qcow2 layer for converting from raw to qcow2

    source: existing disk (raw) <- temporary snapshot (qcow2)
    destination: new disk (qcow2) <- temporary snapshot (qcow2)

  - Create one raw layer for converting from qcow2 to raw

    source: existing disk (qcow2) <- temporary snapshot (qcow2)
    destination: new disk (raw) <- temporary snapshot (qcow2)

- Start mirroring changes from source temporary snapshot to target
  temporary snapshot.
  Code exists

- Convert the source disk to target disk using qemu-img convert
  Code exists

- When the mirror job is ready, and converting the disk to the target
  completed, switch to the target disk
  Code exists

- Delete the source disk chain from storage.
  Code exists

- Delete the temporary snapshot on target disk
  Code exists

So this requires basically changing engine to support new configuration
for live storage migration, and maybe minimal changes on host side,
in case code assumes that storage domains are always different.

A different implementation would be using libvirt blockPull():
https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockPull

For converting from raw to qcow2:

- Add temporary snapshot
- Start blokPull job
- When job is done, the base volume should be removed from the chain
- Delete the raw volume storage

This requires lot of work, mostly like live merge. We may be able to 
ruse code from live merge, and we understand the problem better now,
so we can avoid errors done in the past in this area.

For converting from qcow2 to raw we can:
- Add empty raw layer bellow the qcow2 layer
  (not sure if libvirt/qemu support this kind of change now)
- Use existing live merge to code to push data from the qcow2 layer
  to the raw layer
- When the job is done, switch to the raw layer
- Delete the qcow2 layer.

The second option is already implemented on the host side, but on engine
side this works only with snapshots, so more work is needed.

The advantage of live storage migration:
- Using same flow for any format change (raw->qcow2, qcow2->raw)
- Code on host is mostly exists and well tested for many years
- Code on engine is mostly exists and well tested many years
- Coping disks is less likely to affect the VM, done in a separate process

The advantage of the second option (blockPull):
- Can be more efficient:
  - No need to create temporary snapshots
  - No need to mirror temporary snapshots
  - No need to delete temporary snapshots

Both options can be reused also for sparsifying disks, which is not effective
today for qocw2 disks on block storage.

Comment 43 Eyal Shenitzky 2021-04-08 06:01:52 UTC
This RFE's main focus is to allow converting disk formats from RAW to QCOW2.
It will provide the option to use the incremental backup feature for RAW disks
that already in the system without the need to create a snapshot.

Converting the disk format can be done in the first step for non-running VMs.

Comment 50 Arik 2022-02-28 08:57:12 UTC
Benny, please write a few words in the Doc Text and move to modified if no further change is needed (do we need to add it to Ansible?)

Comment 55 sshmulev 2022-04-03 07:43:55 UTC
Tested on all the supported storage types: NFS, GLUSERT, ISCSI, ISCSI_GW, FCP 
converted to all disk possible properties:

file:
raw		preallocated		disabled
raw		(sparse)Thin		disabled
cow		(sparse)Thin		enabled
cow		preallocated		enabled

Block:
raw		preallocated		disabled
cow		preallocated		enabled
cow		(sparse)Thin		enabled


Using the rest API request:
POST http://engine/ovirt-engine/api/disks/123/convert
<action>
    <disk>
        <sparse>true/false</sparse>
        <format>raw/cow</format>
        <backup>incremenatl/None</backup>
    </disk>
</action>

Versions:
engine-Version 4.5.0-0.237.el8ev
vdsm-4.50.0.10-1.el8ev

Steps:
For each scenario need to create the following disks: (I used 1G of each disk)
1. On file SDs(NFS, GLUSTER)
Raw/Preallocated/Incremental disabled
Raw/Thin/Incremental disabled
Cow/Thin/Incremental Enabled
Cow/Preallocated/Incremental enabled
2. On Block SDs(ISCSI, ISCSI GW, FCP)
Raw/Preallocated/Incremental disabled
Cow/Preallocated/Incremental enabled
Cow/Thin/Incremental enabled

____________________________________________________________________________________________
# Convert format type

(Block)
0. checksum before the disk convert:
python3.6 checksum_disk.py -c engine <disk_ID>

1. From Raw/Preallocated/Incremental disabled --> Cow/Preallocated/Incremental enabled
<action>
    <disk>
        <format>cow</format>
        <backup>incremental</backup>
    </disk>
</action>

2.From Cow/Preallocated/Incremental enabled --> Raw/Preallocated/Incremental disabled
<action>
    <disk>
        <format>raw</format>
        <backup>none</backup>
    </disk>
</action>

3. check data integrity after the disk convert:
python3.6 checksum_disk.py -c engine <disk_ID>


(File)
0. checksum before the disk convert:
python3.6 checksum_disk.py -c engine <disk_ID>

1. From Raw/Preallocated/Incremenatl disabled --> Cow/Preallocated/Incremenatl enabled
<action>
    <disk>
        <format>cow</format>
        <backup>incremental</backup>
    </disk>
</action>

2. From Raw/Thin/Incremental disabled --> Cow/Thin/Incremental Enabled
<action>
    <disk>
        <format>cow</format>
        <backup>incremental</backup>
    </disk>
</action>

3. From Cow/Thin/Incremenatl Enabled --> Raw/Thin/Incremenatl disabled
<action>
    <disk>
        <format>raw</format>
        <backup>none</backup>
    </disk>
</action>

4. From Cow/Preallocated/Incremenatl enabled --> Raw/Preallocated/Incremenatl disabled
<action>
    <disk>
        <format>raw</format>
        <backup>none</backup>
    </disk>
</action>

5. check data integrity after the disk convert:
python3.6 checksum_disk.py -c engine <disk_ID>
____________________________________________________________________________________________

# Convert allocation policy

(Block)
0. checksum before the disk convert:
python3.6 checksum_disk.py -c engine <disk_ID>

1. From Cow/Preallocated/Incremenatl enabled --> Cow/Thin/Incremenatl enabled
<action>
    <disk>
        <sparse>true</sparse>
        <backup>incremental</backup>
    </disk>
</action>

2. From Cow/Thin/Incremental enabled --> Cow/Preallocated/Incremental enabled
<action>
    <disk>
        <sparse>false</sparse>
        <backup>incremental</backup>
    </disk>
</action>

3.check data integrity after the convert:
python3.6 checksum_disk.py -c engine <disk_ID>


(File)
1. From Raw/Preallocated/Incremenatl disabled --> Raw/Thin/Incremenatl disabled
<action>
    <disk>
        <sparse>true</sparse>
        <backup>none</backup>
    </disk>
</action>

2. From Raw/Thin/Incremental disabled --> Raw/Preallocated/Incremental disabled
<action>
    <disk>
        <sparse>false</sparse>
        <backup>none</backup>
    </disk>
</action>

3. From Cow/Thin/Incremenatl Enabled --> Cow/Preallocated/Incremenatl Enabled
<action>
    <disk>
        <sparse>false</sparse>
        <backup>incremental</backup>
    </disk>
</action>

4. Cow/Preallocated/Incremenatl enabled --> Cow/Thin/Incremenatl enabled

<action>
    <disk>
        <sparse>true</sparse>
        <backup>incremental</backup>
    </disk>
</action>

5. check data integrity after the disk convert:
python3.6 checksum_disk.py -c engine <disk_ID>
____________________________________________________________________________________________

# Convert format + allocation policy (when disks are floating and also when they are attached to a non-running VM)

(Block)
0. checksum before the disk convert:
python3.6 checksum_disk.py -c engine <disk_ID>

1. From Raw/Preallocated/Incremental disabled --> Cow/Thin/Incremental enabled
<action>
    <disk>
        <format>cow</format>
        <sparse>true</sparse>
        <backup>incremental</backup>
    </disk>
</action>

2. From Cow/Thin/Incremental enabled --> Raw/Preallocated/Incremental disabled
<action>
    <disk>
        <format>raw</format>
        <sparse>false</sparse>
        <backup>none</backup>
    </disk>
</action>


(File)
1. From  Raw/Preallocated/Incremenatl disabled --> Cow/Thin/Incremenatl enabled
<action>
    <disk>
        <format>cow</format>
        <sparse>true</sparse>
        <backup>incremental</backup>
    </disk>
</action>

2. From Raw/Thin/Incremental disabled --> Cow/Preallocated/Incremental enabled
<action>
    <disk>
        <format>cow</format>
        <sparse>false</sparse>
        <backup>incremental</backup>
    </disk>
</action>

3. From Cow/Thin/Incremenatl Enabled --> Raw/Preallocated/Incremenatl disabled
<action>
    <disk>
        <format>raw</format>
        <sparse>false</sparse>
        <backup>none</backup>
    </disk>
</action>

4. From Cow/Preallocated/Incremenatl enabled --> Raw/Thin/Incremenatl disabled
<action>
    <disk>
        <format>raw</format>
        <sparse>true</sparse>
        <backup>none</backup>
    </disk>
</action>

5. check data integrity after the disk convert:
python3.6 checksum_disk.py -c engine <disk_ID>
____________________________________________________________________________________________

# Full and incremental backup after the disk (format+allocation policy) convert:

1. Create the following disks:
(Block)
Raw / preallocated
(File)
Raw / preallocated
Raw / (sparse)Thin

2. Create a VM from template and attach all the created disks and checksum of the disks before the disk convert:
python3.6 checksum_disk.py -c engine <disk_ID>
3. Convert disks:

(Block)
a. From Raw/Preallocated/Incremental disabled --> Cow/Thin/Incremental enabled
<action>
    <disk>
        <format>cow</format>
        <sparse>true</sparse>
        <backup>incremental</backup>
    </disk>
</action>


(File)
a. From Raw/Preallocated/Incremental disabled --> Cow/Thin/Incremental enabled
<action>
    <disk>
        <format>cow</format>
        <sparse>true</sparse>
        <backup>incremental</backup>
    </disk>
</action>

b. From Raw/Thin/Incremental disabled --> Cow/Preallocated/Incremental enabled
<action>
    <disk>
        <format>cow</format>
        <sparse>false</sparse>
        <backup>incremental</backup>
    </disk>
</action>

4. checksum of the disks after the convert and compare it's identical to the checksum that was done before the disk convert:
python3.6 checksum_disk.py -c engine <disk_ID>

5. Start Full backup:
python3.6 backup_vm.py -c engine full <VM_ID>

6. Write some data to the disks and checksum before incremental backup:
# dd if=/dev/urandom of=/dev/sda bs=4k status=progress
# python3.6 checksum_disk.py -c engine <disk_ID>

7. Start incremental backup:
python3.6 backup_vm.py -c engine incremental <VM_ID> --from-checkpoint-uuid <from checkpoind id>

8. checksum and compare it is identical to the one before the incremental backup:
python3.6 checksum_disk.py -c engine <disk_ID>

Comment 56 sshmulev 2022-04-03 08:01:14 UTC
Tested convert disk functionality (see comment 55).
The only issue faced during the test was when converting a disk that resides on the Netapp storage server, see bug(https://bugzilla.redhat.com/show_bug.cgi?id=2069670)

It usually reproduces on the same LUN id although when checked the issued LUN, it had enough space and with autogrow of 200GB.
(This issue happened on ISCSI and FCP SD)
Still not moving the bug to verified since it is blocked by bug 2069670.

Comment 57 sshmulev 2022-04-26 10:17:03 UTC
Verified successfully.

Verisons:
rhv-4.5.0-7
ovirt-engine-4.5.0.4-0.1.el8ev.noarch
vdsm-4.50.0.13-1.el8ev.ppc64le

Verified on storage types: ISCSI, FCP, ISCSI-GW, NFS, Gluster (according to comment 55).

Comment 62 errata-xmlrpc 2022-05-26 16:22:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4711