Bug 1393049
| Summary: | Backup/restore fails to run on an appliances in pglogical replication | |||
|---|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | luke couzens <lcouzens> | |
| Component: | Appliance | Assignee: | Nick Carboni <ncarboni> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | luke couzens <lcouzens> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 5.7.0 | CC: | abellott, cpelland, dajohnso, greartes, jhardy, lcouzens, ncarboni, obarenbo | |
| Target Milestone: | GA | Keywords: | TestOnly | |
| Target Release: | 5.8.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | black:replication | |||
| Fixed In Version: | 5.8.0.0 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1399673 (view as bug list) | Environment: | ||
| Last Closed: | 2017-06-12 17:25:33 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | CFME Core | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1399673 | |||
|
Description
luke couzens
2016-11-08 17:57:12 UTC
Luke, Can you list out exactly how you went about creating the backup and doing the restore? For example did you use pg_dump manually or did you use the rake task? Same with the restore, did you use the console? Also, it looks like some of this stuff might log to the appliance_console.log can you check for any additional output there? Thanks So you can finish up the restore if you drop the subscription on the global region. That is the extra connection that is preventing the restore from moving forward. We can make that a check we do during the restore process (drop the subscription or warn about it). Another issue we will run into is that replication won't survive the backup/restore process. This is because pg_dump doesn't handle replication slots which is what pglogical relies on to work. We would need to change the format of the backup file to include cfme specific metadata about replication slots. That would be similar to the changes made to the upgrade script for this issue https://bugzilla.redhat.com/show_bug.cgi?id=1391693 This PR will allow the restore to go through as long as we are either restoring to a global database or there are no subscriptions to the remote database. We get around the connection problem by dropping subscriptions if they exist in global databases before executing the restore. To restore a remote database, the subscription will have to be dropped manually before attempting the restore. https://github.com/ManageIQ/manageiq-gems-pending/pull/23 Unfortunately, this does not solve the issue of replication not working after the restore, but hopefully this we be enough for this blocker bug as fixing replication for the restore will be a much larger effort. New commit detected on ManageIQ/manageiq-gems-pending/master: https://github.com/ManageIQ/manageiq-gems-pending/commit/aca175864e5962fa804552840873d7918a2df2d8 commit aca175864e5962fa804552840873d7918a2df2d8 Author: Nick Carboni <ncarboni> AuthorDate: Fri Nov 18 17:10:30 2016 -0500 Commit: Nick Carboni <ncarboni> CommitDate: Fri Nov 18 17:10:30 2016 -0500 Don't test for connections before the restore The first part of the restore is to drop the database. This will fail if there are connections to the database. Failing before we call restore will also prevent us from fixing the situation by removing any pglogical subscriptions that may be present before dropping the database. Instead we can print an error after we fail to inform the user that the failure was because there were connections to the database. https://bugzilla.redhat.com/show_bug.cgi?id=1393049 lib/gems/pending/appliance_console.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) New commit detected on ManageIQ/manageiq-gems-pending/master: https://github.com/ManageIQ/manageiq-gems-pending/commit/d081653ba85a0752f102a7a9c18b5e85be803ea4 commit d081653ba85a0752f102a7a9c18b5e85be803ea4 Author: Nick Carboni <ncarboni> AuthorDate: Fri Nov 18 17:13:00 2016 -0500 Commit: Nick Carboni <ncarboni> CommitDate: Fri Nov 18 17:13:00 2016 -0500 Drop pglogical subscriptions before the restore pglogical subscriptions cause a postgresql worker process to hold open connections to the database. If we attempt to drop the database while we still have subscriptions, we will always fail. This commit removes any existing subscriptions before attempting a restore and ignores the failures in the case that pglogical is not being used. This allows the database restore to complete cleanly. https://bugzilla.redhat.com/show_bug.cgi?id=1393049 lib/gems/pending/util/postgres_admin.rb | 14 +++++++++++--- spec/util/postgres_admin_spec.rb | 7 +++++++ 2 files changed, 18 insertions(+), 3 deletions(-) Verified in 5.8.0.3 New commit detected on ManageIQ/manageiq-appliance_console/master: https://github.com/ManageIQ/manageiq-appliance_console/commit/117862145b48c1f0ad9b3d52bcce129186654329 commit 117862145b48c1f0ad9b3d52bcce129186654329 Author: Nick Carboni <ncarboni> AuthorDate: Fri Nov 18 22:13:00 2016 +0000 Commit: Nick LaMuro <nicklamuro> CommitDate: Tue Mar 16 19:25:16 2021 +0000 Drop pglogical subscriptions before the restore pglogical subscriptions cause a postgresql worker process to hold open connections to the database. If we attempt to drop the database while we still have subscriptions, we will always fail. This commit removes any existing subscriptions before attempting a restore and ignores the failures in the case that pglogical is not being used. This allows the database restore to complete cleanly. https://bugzilla.redhat.com/show_bug.cgi?id=1393049 (transferred from ManageIQ/manageiq-gems-pending@d081653ba85a0752f102a7a9c18b5e85be803ea4) lib/manageiq/appliance_console/postgres_admin.rb | 14 +- spec/postgres_admin_spec.rb | 7 + 2 files changed, 18 insertions(+), 3 deletions(-) |