Bug 1901835

Summary: [RFE][CBT] Redefine VM checkpoint without using the VM domain XML
Product: [oVirt] ovirt-engine Reporter: Eyal Shenitzky <eshenitz>
Component: BLL.StorageAssignee: Eyal Shenitzky <eshenitz>
Status: CLOSED CURRENTRELEASE QA Contact: Ilan Zuckerman <izuckerm>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4.4CC: bugs, dfodor, eshames, nsoffer, sfishbai
Target Milestone: ovirt-4.4.5Keywords: FutureFeature
Target Release: ---Flags: pm-rhel: planning_ack?
pm-rhel: devel_ack?
pm-rhel: testing_ack?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Release Note
Doc Text:
Feature: Redefine VM backup checkpoint without the domain XML of the VM. Reason: Redefine the VM backup checkpoint complicates the backup flow: 1. When the backup is taken the engine should keep the created checkpoint XML in its database - if the XML is missing recovery flow added to get over this case. 2. Keeping the XML in the engine database is taking a lot of space. 3. when the checkpoint is redefined, the XML should be given to Libvirt so a lot of data is transferred from the Engine to the host. Without the need of using the checkpoint XML when the checkpoint is redefined, we can reduce the flow complexity and reduce the needed space for the backup operation. Result: Checkpoint redefinition is done without the checkpoint XML that was kept in the Engine database when the backup was taken. There is no column for keeping the XML in the database anymore. The checkpoint is now redefined by composing the checkpoint XML in the host and sending it to Libvirt. Incremental backup is still in a technical preview state both in oVirt/RHV and in Libvirt. Because of that, Libvirt doesn't provide a way to identify if checkpoint redefinition without the domain XML is supported or not. Due to that fact, the Engine cannot support in both ways to redefine the checkpoints so this change breaks the backward compatibility with the previous backups that were taken. A full backup is needed even if the VM had previous backups that were taken. Also, to use incremental backup, both Engine and VDSM (v4.40.50.3) should have the latest version.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-18 15:12:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1901830, 1904486    
Bug Blocks: 1891470    

Description Eyal Shenitzky 2020-11-26 08:01:55 UTC
Description of problem:

When redefining a VM checkpoint, Libvirt requires the VM domain XML.
After bug 1901830 will be solved, the domain XML is not needed when the checkpoint is redefined and the engine could drop the need to persist the checkpoint XML in his database.

The flow for starting a backup will not include the need to get the created checkpoint XML and persist it in the database + we will be able to remove all the recovery flows that handles the cases when the checkpoint XML is missing.
The engine only needs to keep the backup ID that was part of the checkpoint creation.

When redefining the checkpoint, the engine can now send to VDSM all the data for generating the checkpoint XML so VDSM will create it and sent it to Libvirt.

This will reduce the code in the engine and in VDSM, it will reduce the amount of data that we keep in the engine database, also, it will reduce the number of errors that may occur during the backup process since the number of calls between the engine, VDSM and Libvirt reduced also,


Version-Release number of selected component (if applicable):
4.4.4_master

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Shir Fishbain 2021-01-26 11:39:23 UTC
Hi Eyal,
Please provide info on how to verify it correctly.

Comment 2 Eyal Shenitzky 2021-01-26 12:14:22 UTC
(In reply to Shir Fishbain from comment #1)
> Hi Eyal,
> Please provide info on how to verify it correctly.

Can be verified by running the regular tiers in the automation.

Comment 3 Ilan Zuckerman 2021-02-08 07:14:15 UTC
Can this be verified by repeating those steps from existing TC (TestCase27319)?

    1. Create a VM from template and start it
    2. Make a full backup of its disks X amount of times (defined by config.CHECKPOINTS_AMOUNT).
        a. Per each backup:
            - Check that a checkpoint is created.
            - Check that total amount of checkpoints matches the amount of times backup was executed.
    3. Run SDK example script for removing the root checkpoint X amount of times (defined by config.CHECKPOINTS_AMOUNT).
        a. Per each checkpoint removal:
            - Check that its ID matches the actual root checkpoint.
            - Check that actual amount of checkpoints on the vm matches the expected amount.

Comment 4 Eyal Shenitzky 2021-02-08 11:57:56 UTC
(In reply to Ilan Zuckerman from comment #3)
> Can this be verified by repeating those steps from existing TC
> (TestCase27319)?
> 
>     1. Create a VM from template and start it
>     2. Make a full backup of its disks X amount of times (defined by
> config.CHECKPOINTS_AMOUNT).
>         a. Per each backup:
>             - Check that a checkpoint is created.
>             - Check that total amount of checkpoints matches the amount of
> times backup was executed.
>     3. Run SDK example script for removing the root checkpoint X amount of
> times (defined by config.CHECKPOINTS_AMOUNT).
>         a. Per each checkpoint removal:
>             - Check that its ID matches the actual root checkpoint.
>             - Check that actual amount of checkpoints on the vm matches the
> expected amount.

Yes
Also, please verify that all the automation passed and there are no regressions.

Comment 5 RHEL Program Management 2021-02-09 07:17:35 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 6 Ilan Zuckerman 2021-02-09 11:51:19 UTC
Moving this on verified based on the tier1 and tier2  (tier3 doesnt exist) Automated test cases of CBT which were executed on latest rhv-4.4.5-4
tier2 was executed locally, tier1 is "RHV-4.4-tier1 #154" and can be found in Polarion.
No regression was spotted, or suspicious behavior. All test cases passed (except for known issues BZ 1914636 and BZ 1849861)

Comment 7 Sandro Bonazzola 2021-03-18 15:12:54 UTC
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 8 Sandro Bonazzola 2021-03-22 12:55:32 UTC
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.