1983636 – Add "last_updated" column to the "vm_backups" DB table

Bug 1983636 - Add "last_updated" column to the "vm_backups" DB table

Summary: Add "last_updated" column to the "vm_backups" DB table

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	ovirt-4.4.8
Target Release:	---
Assignee:	Pavel Bar
QA Contact:	Amit Sharir
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-07-19 10:06 UTC by Pavel Bar
Modified:	2021-09-08 14:12 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-09-08 14:12:12 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2021:3460	None	None	None	2021-09-08 14:12:16 UTC
oVirt gerrit	115936	None	None	None	2021-08-12 10:00:51 UTC
oVirt gerrit	115944	master	MERGED	Backup: add new 'lastUpdated' date field	2021-08-02 06:18:23 UTC
oVirt gerrit	116010	master	MERGED	Backup: add a new "modification" date field	2021-08-03 07:56:17 UTC

Description Pavel Bar 2021-07-19 10:06:37 UTC

Description of problem:
After implementing the https://bugzilla.redhat.com/show_bug.cgi?id=1981297 RFE there is an issue with timing of the "vm_backups" & "vm_backup_disk_map" DB tables cleanup by the relevant cleanup thread.
The only timestamp column that existed in the DB tables above was "_create_date" in the "vm_backups" DB table. And it was used to estimate when the backup entry can be removed.
The field describes the date the backup *started*, not when it *finished*.
That means that if the backup took a long time, it could be cleaned-up not after 15 minutes (succeeded) or 30 minutes (failed) period that started counting only after the backup has finished, but after a shorter period of time (even a few seconds in worst case scenario).
The reason for that is that after a long backup has finished, the "_create_date" field already aged, so the cleanup thread that runs every 10 minutes might consider the backup entry "old enough" and remove faster than would be desired.

Version-Release number of selected component (if applicable):
4.4.7

How reproducible:
Run a long backup. The easiest way to do so is to split the backup flow to a few steps:
1. Start a backup, i.e., run:
./backup_vm.py -c engine1 start 872bfe41-821f-45a1-972a-c8391b1bd026
2. Wait for a long time, i.e. 15 minutes.
No need in download step, not needed, but won't hurt.
3. Stop a backup, i.e., run:
./backup_vm.py -c engine1 stop 872bfe41-821f-45a1-972a-c8391b1bd026 1315112e-b971-49d0-afaa-ee96a68c81a6
After that step the backup entry is legitimate for cleanup.

Steps to Reproduce:
1. Run a long backup, that will take 15 minutes or more (see "How reproducible" above for suggestion). If possible 1 backup that succeeds and 1 backup that fails.
The fact that the backup took more than 15 minutes (30 minutes for failed backup), will cause the cleanup thread to clean the backup from the DB the moment the cleanup thread will run next time.
2. Wait and see when the DB entry is cleaned-up from the DB (can be seen via REST API as well).

Actual results:
Backup DB entry is not kept at the DB for 15 minutes (for succeeded backup) or 30 minutes (for failed backup) in case that the backup took a long time to complete. It's removed faster (depending on when the cleanup thread will run next time).

Expected results:
Keep the backup DB entry for *at least* 15 minutes (for succeeded backup) or 30 minutes (for failed backup).

Additional info:
Solution: need to add to the "vm_backups" DB table the "last_updated" column (similar to what exists in "image_transfers" DB table) and update it correctly at the code the moment backup finishes (either with success or with failure).

Comment 1 Pavel Bar 2021-08-10 10:55:20 UTC

In a final solution the "vm_backups" DB table's column name is "_update_date" and not "last_updated" as suggested at the bug description. Also, at the REST API response the date is under "modification_date" tag.
The rest of the bug details (i.e., reproduction, etc) is the same.

Comment 4 Amit Sharir 2021-08-12 09:11:13 UTC

Moving this bug back to "post" since the relevant patch (core: add '_update_date' column to 'vm_backups' DB table) is still in "post" status.

Comment 5 Amit Sharir 2021-08-16 11:44:06 UTC

Version: 
ovirt-engine-4.4.8.4-0.7.el8ev.noarch
vdsm-4.40.80.5-1.el8ev.x86_64

Verification steps:
1. I used the steps mentioned in the summary of the bug (did the successful backup flow).
2. After waiting for ~20 minutes I validated via API and DB that the backup still exists.
3. I stopped the backup and saw that the "_update_date"/"modification_date" in the DB/API was updated as expected and coherently.
4. 15 minutes after I stopped the backup I was still able to see the backup entry in the DB/API (as expected).
5. ~22 minutes after I stopped the backup, the backup entry disappeared from DB/API (as expected).


Verification conclusions:
The expected output matched the actual output.
The total flow mentioned was done with no errors/unexpected logs.
I also checked the DB and API responses during the whole procedure to validate coherency and behavior (worked as expected).

Bug verified.

Comment 10 errata-xmlrpc 2021-09-08 14:12:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.8]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3460

Note You need to log in before you can comment on or make changes to this bug.