Bug 1983636 - Add "last_updated" column to the "vm_backups" DB table
Summary: Add "last_updated" column to the "vm_backups" DB table
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.7
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ovirt-4.4.8
: ---
Assignee: Pavel Bar
QA Contact: Amit Sharir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-19 10:06 UTC by Pavel Bar
Modified: 2021-09-08 14:12 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-08 14:12:12 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:3460 0 None None None 2021-09-08 14:12:16 UTC
oVirt gerrit 115936 0 None None None 2021-08-12 10:00:51 UTC
oVirt gerrit 115944 0 master MERGED Backup: add new 'lastUpdated' date field 2021-08-02 06:18:23 UTC
oVirt gerrit 116010 0 master MERGED Backup: add a new "modification" date field 2021-08-03 07:56:17 UTC

Description Pavel Bar 2021-07-19 10:06:37 UTC
Description of problem:
After implementing the https://bugzilla.redhat.com/show_bug.cgi?id=1981297 RFE there is an issue with timing of the "vm_backups" & "vm_backup_disk_map" DB tables cleanup by the relevant cleanup thread.
The only timestamp column that existed in the DB tables above was "_create_date" in the "vm_backups" DB table. And it was used to estimate when the backup entry can be removed.
The field describes the date the backup *started*, not when it *finished*.
That means that if the backup took a long time, it could be cleaned-up not after 15 minutes (succeeded) or 30 minutes (failed) period that started counting only after the backup has finished, but after a shorter period of time (even a few seconds in worst case scenario).
The reason for that is that after a long backup has finished, the "_create_date" field already aged, so the cleanup thread that runs every 10 minutes might consider the backup entry "old enough" and remove faster than would be desired.

Version-Release number of selected component (if applicable):
4.4.7

How reproducible:
Run a long backup. The easiest way to do so is to split the backup flow to a few steps:
1. Start a backup, i.e., run:
./backup_vm.py -c engine1 start 872bfe41-821f-45a1-972a-c8391b1bd026
2. Wait for a long time, i.e. 15 minutes.
No need in download step, not needed, but won't hurt.
3. Stop a backup, i.e., run:
./backup_vm.py -c engine1 stop 872bfe41-821f-45a1-972a-c8391b1bd026 1315112e-b971-49d0-afaa-ee96a68c81a6
After that step the backup entry is legitimate for cleanup.

Steps to Reproduce:
1. Run a long backup, that will take 15 minutes or more (see "How reproducible" above for suggestion). If possible 1 backup that succeeds and 1 backup that fails.
The fact that the backup took more than 15 minutes (30 minutes for failed backup), will cause the cleanup thread to clean the backup from the DB the moment the cleanup thread will run next time.
2. Wait and see when the DB entry is cleaned-up from the DB (can be seen via REST API as well).

Actual results:
Backup DB entry is not kept at the DB for 15 minutes (for succeeded backup) or 30 minutes (for failed backup) in case that the backup took a long time to complete. It's removed faster (depending on when the cleanup thread will run next time).

Expected results:
Keep the backup DB entry for *at least* 15 minutes (for succeeded backup) or 30 minutes (for failed backup).

Additional info:
Solution: need to add to the "vm_backups" DB table the "last_updated" column (similar to what exists in "image_transfers" DB table) and update it correctly at the code the moment backup finishes (either with success or with failure).

Comment 1 Pavel Bar 2021-08-10 10:55:20 UTC
In a final solution the "vm_backups" DB table's column name is "_update_date" and not "last_updated" as suggested at the bug description. Also, at the REST API response the date is under "modification_date" tag.
The rest of the bug details (i.e., reproduction, etc) is the same.

Comment 4 Amit Sharir 2021-08-12 09:11:13 UTC
Moving this bug back to "post" since the relevant patch (core: add '_update_date' column to 'vm_backups' DB table) is still in "post" status.

Comment 5 Amit Sharir 2021-08-16 11:44:06 UTC
Version: 
ovirt-engine-4.4.8.4-0.7.el8ev.noarch
vdsm-4.40.80.5-1.el8ev.x86_64

Verification steps:
1. I used the steps mentioned in the summary of the bug (did the successful backup flow).
2. After waiting for ~20 minutes I validated via API and DB that the backup still exists.
3. I stopped the backup and saw that the "_update_date"/"modification_date" in the DB/API was updated as expected and coherently.
4. 15 minutes after I stopped the backup I was still able to see the backup entry in the DB/API (as expected).
5. ~22 minutes after I stopped the backup, the backup entry disappeared from DB/API (as expected).


Verification conclusions:
The expected output matched the actual output.
The total flow mentioned was done with no errors/unexpected logs.
I also checked the DB and API responses during the whole procedure to validate coherency and behavior (worked as expected).

Bug verified.

Comment 10 errata-xmlrpc 2021-09-08 14:12:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.8]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3460


Note You need to log in before you can comment on or make changes to this bug.