Bug 1981297

Summary: [RFE] Add new backup phases and disable backup/image transfers DB instant cleanup
Product: Red Hat Enterprise Virtualization Manager Reporter: Pavel Bar <pbar>
Component: ovirt-engineAssignee: Pavel Bar <pbar>
Status: CLOSED ERRATA QA Contact: Amit Sharir <asharir>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4.0CC: aefrat, bzlotnik, dfodor, eshenitz, mjurasek
Target Milestone: ovirt-4.4.8Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-08 14:12:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1980428    
Bug Blocks:    

Description Pavel Bar 2021-07-12 10:26:54 UTC
Description of problem:
After backup / image transfer operation finishes, all the execution data disappears.
That means, that the user doesn't know the final execution state of the operation that was visible via DB and API while the backup / image transfer execution was still in progress.
In case of backup operation, the situation was even worse - there was no indication for success/failure, the last thing the user might be able to see is the 'FINALIZING' status.

How reproducible:
Simply execute backup / image transfer flows.

Steps to Reproduce:
1. Run backup / image transfer.
2. Check the "vm_backups" & "vm_backup_disk_map" (for backup) or the "image_transfers" for either image download or full backup (that contains inside the "download" step). You should also check via REST API.

Actual results:
While the process is ongoing, you see the relevant data there.
When the operation is finished, the relevant DB entry disappears (both from the DB and from the REST API).

Expected results:
We want the data to be kept for some time for user to use if he wants to, and then to be deleted automatically. So the user will be able to see the operation result. On the other hand we also don't want to over-polute the database with too old data.

Additional info:
What should be implemented and then tested:
1. Add 2 new backup phases to show possible execution end statuses: "SUCCEEDED" & "FAILED".
2. Disable 'vm_backups', "vm_backup_disk_map" & "image_transfers" DB tables instant cleanup after the backup / image transfer operation is over to allow DB & API status retrieval by user.
3. Add DB cleanup scheduled thread to automatically clean backups and image transfers once in a while: the thread will be run every 10 minutes and will clean all the success entries that are 15 minutes old and failed ones that are 30 minutes old.
Separate values for backup & for image transfer operations, an additional value for the cleanup thread rate (all 5 values are configurable):
DbEntitiesCleanupRateInMinutes 10
SucceededBackupCleanupTimeInMinutes 15
FailedBackupCleanupTimeInMinutes 30
SucceededImageTransferCleanupTimeInMinutes 15
FailedImageTransferCleanupTimeInMinutes 30

Comment 3 Amit Sharir 2021-07-26 13:39:06 UTC
Version: 
vdsm-4.40.80.2-1.el8ev.x86_64
ovirt-engine-4.4.8.1-0.9.el8ev.noarch


Verification steps:
I split my verification into 2 main flows - "succeeded" and "failed" flows.


1. Created a VM with multiple disks via UI. 
2. Took multiple snapshots of the VM. 
3. Started a full backup (for double validation I did the full-backup scenario via API and SDK)
3a. API call <{{engine}}vms/35ae3ad2-f4cb-4849-9308-83c012e840ae/backups> 
3b. SDK script </python3 backup_vm.py -c engine full <vm-id>>
4. The full backup of step 3 created image_transfers + backup object.
5. Checked the DB tables of "vm_backups","vm_backup_disk_map", "image_transfers" to check the phases.
6. Checked via the API the "image-transfer" and the "backup" phase - (API calls I used: https://<engine>/ovirt-engine/api/imagetransfers, https://<engine>/ovirt-engine/api/vms/<vm-uuid>/backups)  

Succeeded flow: 
1. I checked that initially the phase for the backup was "succeeded" both via API and via DB. 
2. I checked that initially, the phase for the image-transfer was "9" in the DB and "finished_succeded" via API.
3. After ~15 minutes the backup and image-transfer objects vanished from both DB and API - as expected.

Failed flow:
1. Since it is hard to reproduce a failed phase in a normal user flow I used SQL injection to change the values of the tables. 
2. For the vm_backups table I used SQL call <update vm_backups set phase = 'Failed' where phase = 'Succeeded';>
3. For the image_transfer table I used SQL call <update image_transfers set phase = 10 where phase = 9;>
4. Step 2+3 changed the phase in the DB and in the API. 
4a. For the backup api phase "succeeded" -> "failed".
4b. For the image-transfer API phase "finished_succeded" -> "finished_failed"

Verification conclusions:
The expected output matched the actual output.
The total flow mentioned was done with no errors.
The backup and image-transfer phase vanished in ~15/30 in accordance to the expected behaviour relevant to their phase.

Comment 9 errata-xmlrpc 2021-09-08 14:12:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.8]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3460