Bug 2053669
| Summary: | [RFE] Allow changing vm powerstate during backup operation without interrupting the backup | ||
|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Yury.Panchenko |
| Component: | BLL.Storage | Assignee: | Nir Soffer <nsoffer> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Evelina Shames <eshames> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.4.9.5 | CC: | aefrat, ahadas, bugs, bzlotnik, jean-louis, nsoffer, Yury.Panchenko |
| Target Milestone: | ovirt-4.5.0 | Keywords: | FutureFeature, ZStream |
| Target Release: | 4.5.0 | Flags: | sbonazzo:
ovirt-4.5+
eshames: testing_plan_complete+ pm-rhel: planning_ack? pm-rhel: devel_ack+ pm-rhel: testing_ack? |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ovirt-engine-4.5.0 | Doc Type: | Enhancement |
| Doc Text: |
Feature:
Use a temporary snapshot during a backup to decouple the backup
operation from the VM.
Reason:
Backup can take lot of time. Preventing changes in VM power
state or migration during a backup is a problem for users.
Result:
A VM can be started, stopped, or migrated during backup.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-04-20 06:33:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Yury.Panchenko
2022-02-11 17:46:04 UTC
Trying to extract functional requirements from comment 0. 1. Online backup - when VM is during online backup, user should be able to power off the VM without interrupting the backup. 2. Offline backup - when VM is during offline backup, user should be able to power on the VM without interrupting the backup. 3. Power off within the guest during online backup should not interrupt the backup Additional requirement not mentioned in comment 0: 4. Migration - when is during online backup, the system or the user should be able to migrate the VM to another host. An example use case is HA VM that the system try to keep available. 5. HA VM termination - when a HA VM lost the storage lease, sanlock will terminate the VM. If the VM was running a backup, the backup should not be interrupted. Yuri, do you anything to add to these requirements? Hello Nir
> Yuri, do you anything to add to these requirements?
Thank you, There isn't anything to add from me.
Most of the work is in engine, but to enable this we need small API change in vdsm, allowing creating a snapshot with a new bitmap. https://github.com/oVirt/vdsm/pull/86 *** Bug 1994663 has been marked as a duplicate of this bug. *** The only disadvantage that I see here is that we have snapshot involved again, which causes IO to commit the snapshot at the end. While using the scratch disk method, there was no commit at the end (just wipe the scratch disk), which could be an advantage over snapshots on disks with a lot of changes during the backup frame. (In reply to Jean-Louis Dupond from comment #5) > The only disadvantage that I see here is that we have snapshot involved > again, which causes IO to commit the snapshot at the end. > While using the scratch disk method, there was no commit at the end (just > wipe the scratch disk), which could be an advantage over snapshots on disks > with a lot of changes during the backup frame. True, the new way introduces possibly slow delete snapshot at the end of the backup. But with this disadvantage we get lot of advantages: - Can start, stop, migrate, snapshot a VM during backup - Can start backup in most VM state - Have only one kind of backup - Backup I/O does not affect guest I/O - Guest I/O does not affect backup I/O - No scratch disks, no pauses - Simpler flow on engine side - Does not interfere with user snapshots like the old snapshot based backup We have a stress test for the new backup mode here: https://gitlab.com/nirs/ovirt-stress/-/tree/master/backup We did many runs in the last week, doing around 15,000 backups without any issue in the actual backup. Engine API should allow user to disable the snapshot based backup, using the previous snapshot-less way, with the risk of pausing vms during backup if scratch disk become full. Benny, can you explain how the snapshot is disabled in current API? We have a config value that can be toggled: $ engine-config -s UseHybridBackup=false can be used to switch to the existing backup mechanism that does not use snapshots (In reply to Benny Zlotnik from comment #7) > We have a config value that can be toggled: > > $ engine-config -s UseHybridBackup=false This is good for globally disabling the feature by the system admin but it does not give enough power to backup application. I think we need a way to disable the mechanism per backup call. We discussed an option like: POST /ovirt-engine/api/vms/vm-id/backups <backup> <from_checkpoint_id>checkpoint-id</from_checkpoint_id> <use_snapshot>true</use_snapshot> <disks> <disk id="disk-id" /> ... </disks> </backup> If the backup was started with the use_snapshot option, it will report the snapshot during the backup: GET /ovirt-engine/api/vms/vm-id/backups/backup-id <backup> <from_checkpoint_id>checkpoint-id</from_checkpoint_id> <use_snapshot>true</use_snapshot> <snapshot id="snapshot-id"/> <disks> <disk id="disk-id" /> ... </disks> </backup> Yuri, what do think? Hello Nir, I think it's a good idea to have possibility change backup type in the backup request. But let's keep the new backup as a default, so the app doens't have to pass any option to use it. (option <use_snapshot> always true if the app doesn't change it) If some backup app would like to use the old method, it must use something like <use_snapshot>false</use_snapshot> thanks Verified on engine-4.5.0-0.237.el8ev Can you please update doctext? This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |