Bug 2230934

Summary: restore of offline incremental backups is broken
Product: Red Hat Satellite Reporter: Evgeni Golov <egolov>
Component: Satellite MaintainAssignee: Evgeni Golov <egolov>
Status: CLOSED ERRATA QA Contact: Lukas Pramuk <lpramuk>
Severity: high Docs Contact:
Priority: high    
Version: 6.13.3CC: ahumbe, ehelms, hyu, jpathan, mjia, momran, pcreech
Target Milestone: 6.14.0Keywords: PrioBumpGSS, Triaged, UserExperience
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rubygem-foreman_maintain-1.3.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-08 14:20:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Evgeni Golov 2023-08-10 09:37:09 UTC
Originally reported in https://community.theforeman.org/t/how-is-foreman-maintain-restore-i-supposed-to-work/34519

Since "9d5f5c9315e71d08451d51fb2d41dc0f735bdfc9":https://github.com/theforeman/foreman_maintain/commit/9d5f5c9315e71d08451d51fb2d41dc0f735bdfc9 foreman-maintain performs a <code>reindexdb</code> after restoring a backup,
which seems to alter the DB in a way that incremental backups don't apply cleanly anymore
and PostgreSQL isn't able to operate after an incremental restore has been attempted:

<pre>
[postgres@centos8-stream-katello-4-9 ~]$ psql
psql: error: FATAL:  could not open file "base/13449/36756": No such file or directory
[postgres@centos8-stream-katello-4-9 ~]$ psql foreman
psql: error: FATAL:  could not open file "base/18254/33445": No such file or directory
[postgres@centos8-stream-katello-4-9 ~]$ psql candlepin
psql: error: FATAL:  could not open file "base/16385/35536": No such file or directory
</pre>

This is not an issue 
* if you use online backups (those always contain a full DB dump and don't require a REINDEX anyway)
* for non-incremental restores of offline backups

The REINDEX step was added to avoid issues with different versions of libc locales when changing operating system versions.
A possible short term fix would be to only execute the REINDEX when foreman-maintain detects that the backup was taken on a different OS than the restore is happening on.
A better solution would be to re-architecture the restore process to first extract *all* incremental steps and only afterwards perform the necessary DB steps.

The issue is present in the following foreman-maintain versions:
* 1.3.x
* 1.2.4+
* 1.1.10+
* 1.0.19+

Comment 1 Evgeni Golov 2023-08-10 09:37:13 UTC
Created from redmine issue https://projects.theforeman.org/issues/36668

Comment 2 Evgeni Golov 2023-08-10 09:37:14 UTC
Upstream bug assigned to egolov

Comment 3 Bryan Kearney 2023-08-11 12:03:19 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/36668 has been resolved.

Comment 4 Lukas Pramuk 2023-08-28 16:19:31 UTC
VERIFIED.

@Satellite 6.14.0 Snap13
rubygem-foreman_maintain-1.3.5-1.el8sat.noarch

by the following manual reproducer:

1) Populate some content (RHEL8 repo sync)

2) Create a full offline backup
# satellite-maintain backup offline -y /var/backup

3) Populate some more content (RHEL9 repo sync)

4) Create an incremental backup based on the full backup 
# satellite-maintain backup offline -y -i /var/backup/satellite-backup-2023-08-28-10-56-26 /var/backup

5) Restore the full backup
# satellite-maintain restore -y /var/backup/satellite-backup-2023-08-28-10-56-26

6) Try to restore the incremental backup on top of the full backup
# satellite-maintain restore -y /var/backup/satellite-backup-2023-08-28-11-09-43

REPRO:

Extract any existing tar files in backup: 
| Extracting pgsql data                                               [OK]      
--------------------------------------------------------------------------------
REINDEX databases: 
- Reindexing the databases                                            [FAIL]    
Failed executing runuser - postgres -c "reindexdb -a", exit status 1:
 reindexdb: error: could not connect to database template1: FATAL:  could not open file "base/1/46542": No such file or directory
--------------------------------------------------------------------------------
Scenario [Restore backup] failed.

vs.

FIX:

Extract any existing tar files in backup: 
- Extracting pgsql data                                               [OK]      
--------------------------------------------------------------------------------
Migrate pulpcore db: 
| Migrating pulpcore database                                         [OK]      
--------------------------------------------------------------------------------
Ensure Candlepin runs all migrations after restoring the database:    [OK]
--------------------------------------------------------------------------------

>>> incremental restore has finished successfully as the db reindex step is being run only when OS version changes

Comment 7 errata-xmlrpc 2023-11-08 14:20:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.14 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6818

Comment 8 Ron Lavi 2023-11-20 13:28:03 UTC
*** Bug 2250206 has been marked as a duplicate of this bug. ***

Comment 9 momran 2024-03-14 16:16:15 UTC
*** Bug 2269551 has been marked as a duplicate of this bug. ***