Description of problem: After implementing the https://bugzilla.redhat.com/show_bug.cgi?id=1981297 RFE there is an issue with timing of the "vm_backups" & "vm_backup_disk_map" DB tables cleanup by the relevant cleanup thread. The only timestamp column that existed in the DB tables above was "_create_date" in the "vm_backups" DB table. And it was used to estimate when the backup entry can be removed. The field describes the date the backup *started*, not when it *finished*. That means that if the backup took a long time, it could be cleaned-up not after 15 minutes (succeeded) or 30 minutes (failed) period that started counting only after the backup has finished, but after a shorter period of time (even a few seconds in worst case scenario). The reason for that is that after a long backup has finished, the "_create_date" field already aged, so the cleanup thread that runs every 10 minutes might consider the backup entry "old enough" and remove faster than would be desired. Version-Release number of selected component (if applicable): 4.4.7 How reproducible: Run a long backup. The easiest way to do so is to split the backup flow to a few steps: 1. Start a backup, i.e., run: ./backup_vm.py -c engine1 start 872bfe41-821f-45a1-972a-c8391b1bd026 2. Wait for a long time, i.e. 15 minutes. No need in download step, not needed, but won't hurt. 3. Stop a backup, i.e., run: ./backup_vm.py -c engine1 stop 872bfe41-821f-45a1-972a-c8391b1bd026 1315112e-b971-49d0-afaa-ee96a68c81a6 After that step the backup entry is legitimate for cleanup. Steps to Reproduce: 1. Run a long backup, that will take 15 minutes or more (see "How reproducible" above for suggestion). If possible 1 backup that succeeds and 1 backup that fails. The fact that the backup took more than 15 minutes (30 minutes for failed backup), will cause the cleanup thread to clean the backup from the DB the moment the cleanup thread will run next time. 2. Wait and see when the DB entry is cleaned-up from the DB (can be seen via REST API as well). Actual results: Backup DB entry is not kept at the DB for 15 minutes (for succeeded backup) or 30 minutes (for failed backup) in case that the backup took a long time to complete. It's removed faster (depending on when the cleanup thread will run next time). Expected results: Keep the backup DB entry for *at least* 15 minutes (for succeeded backup) or 30 minutes (for failed backup). Additional info: Solution: need to add to the "vm_backups" DB table the "last_updated" column (similar to what exists in "image_transfers" DB table) and update it correctly at the code the moment backup finishes (either with success or with failure).
In a final solution the "vm_backups" DB table's column name is "_update_date" and not "last_updated" as suggested at the bug description. Also, at the REST API response the date is under "modification_date" tag. The rest of the bug details (i.e., reproduction, etc) is the same.
Moving this bug back to "post" since the relevant patch (core: add '_update_date' column to 'vm_backups' DB table) is still in "post" status.
Version: ovirt-engine-4.4.8.4-0.7.el8ev.noarch vdsm-4.40.80.5-1.el8ev.x86_64 Verification steps: 1. I used the steps mentioned in the summary of the bug (did the successful backup flow). 2. After waiting for ~20 minutes I validated via API and DB that the backup still exists. 3. I stopped the backup and saw that the "_update_date"/"modification_date" in the DB/API was updated as expected and coherently. 4. 15 minutes after I stopped the backup I was still able to see the backup entry in the DB/API (as expected). 5. ~22 minutes after I stopped the backup, the backup entry disappeared from DB/API (as expected). Verification conclusions: The expected output matched the actual output. The total flow mentioned was done with no errors/unexpected logs. I also checked the DB and API responses during the whole procedure to validate coherency and behavior (worked as expected). Bug verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.8]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3460