Created attachment 1732456 [details] Logs showing backup flow when engine hangs 30 seconds in VM.stop_backup Description of problem: Calling VmBackup.finalize() blocks while waiting for vdsm to stop the backup, but does not fail if the vdsm call failed. Finalizing a backup may take time in vdsm, for example if libvirt is blocked on a hang qemu monitor, or if the vdsm request is waiting in the jsonrpc qeueue. Here is example backup_vm runs when finalizing backup hangs: $ ./backup_vm.py -c engine-dev full --backup-dir /var/tmp/backups/raw 4dc3bb16-f8d1-4f59-9388-a93f68da7cf0 [ 0.0 ] Starting full backup for VM 4dc3bb16-f8d1-4f59-9388-a93f68da7cf0 [ 0.8 ] Waiting until backup 7b4df572-f664-4b2a-9d7e-b4be9b4ed667 is ready [ 1.9 ] Creating image transfer for disk 126eea31-c5a2-4c01-a18d-9822b0c05c2a [ 3.3 ] Image transfer a9c01706-dc77-417b-91c2-d1c8c53a5403 is ready [ 73.17% ] 4.39 GiB, 12.02 seconds, 373.96 MiB/s [ 15.3 ] Finalizing image transfer [ 17.3 ] Finalizing backup [ 47.4 ] Waiting until backup is finalized ... In this case the finalize() call took 30 seconds, waiting for vdsm response which was blocked 30 seconds on libvirt, timing out after 30 seconds. Users managing multiple backups do not want to wait for the response. They want to be able to finalize multiple backups (e.g. backing up 1000's vms during backup window), and wait for backup completion separately. It may be useful to provide optional blocking interface, that wait until a backup is finalized, but in this case the API must fail if backup fail to finalize, so it cannot get stuck for ever. The blocking API creates another issue - calling finalize() does not change the state of the backup - it remains in "ready" state. If finalizing the backup fails in vdsm, the backup is still considered "ready", and new transfers may be started. It would be more useful if finalize was async, and backup state was changing the state to "finalizing" *before* doing anything int he backend, similar to image tranfer. If backup failed to finalize its state can be left as "finalizing", and no new transfers can be started for this backup. Version-Release number of selected component (if applicable): 4.4.4.2_master How reproducible: Always Steps to Reproduce: 1. Start backup 2. Stop backup Actual results: Engine try to stop the backup before returning response to the API caller. Expected results: Engine switch state to "finalizing" and return response to the API caller. Then try to stop the backup. Additional info: Changing backup state may break users expecting the current behavior, but since this feature is still tech preview we can still fix the API. Once backup is released as fully supported API, we cannot make such API changes.
Incremental backup is fully implemented and we shouldn't do any further changes in the API for backward compatibility. Closing.
Reopening since current behavior is wrong, and cause too much trouble. The current API is implemented in the wrong way, and we can fix it without affecting users of the API. How stopping backup should work: 1. User call finalize() 2. The system set a "stopped" flag for the backup 3. The system wakes up the backup command if not running 4. The user get a response 5. The backup command check the stopped flag in all phases, and cleans up as needed depending on the current phase. 6. The user poll the backup phase 7. The system mark the backup as finished when done If the user invoke finalize() more than once, the system can safely ignore the request, since the stopped flag is already set.
*** Bug 2039717 has been marked as a duplicate of this bug. ***
*** Bug 2037277 has been marked as a duplicate of this bug. ***
Hi Mark, Please provide the verification scenario for this bug.
Verification steps: 1. Start a full backup 2. Send backup.finalize request before backup reaches status "Ready" 3. Make sure that the backup finalizes gracefully 4. Start another backup 5. When 'Ready', start image transfer 6. Send backup.finalize request while image transfer is running. Make sure the request fails 7. Cancel transfer/wait for it to finish 8. Send backup.finalize request again, make sure backup is finalized successfuly
(In reply to Mark Kemel from comment #7) > Verification steps: > > 1. Start a full backup > 2. Send backup.finalize request before backup reaches status "Ready" > 3. Make sure that the backup finalizes gracefully > 4. Start another backup > 5. When 'Ready', start image transfer > 6. Send backup.finalize request while image transfer is running. Make sure > the request fails <fault> <detail>[Cannot stop VM backup. There is an active image transfer for VM backup]</detail> <reason>Operation Failed</reason> </fault> > 7. Cancel transfer/wait for it to finish > 8. Send backup.finalize request again, make sure backup is finalized > successfully <action> <status>complete</status> </action> The backup operation ended successfully. Verified with the above steps on: ovirt-engine-4.5.1.1-0.14.el8ev vdsm-4.50.1.2-1.el8ev.x86_64
This bugzilla is included in oVirt 4.5.1 release, published on June 22nd 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.