Description of problem:Restoring back fails on appliances setup in replication using pglogical due to replication slots always accessing the db while trying to restore. Version-Release number of selected component (if applicable):5.7.0.9 and 5.6.2.2 How reproducible:100% Steps to Reproduce: 1.provision 2 appliances 2.configure internal database region99 3.configure second, fetching v2_key from first and setting it as region1 4.wait for evm 5.connect to webui or rr1 6.navigate to settings-configuration-settings-region-replication 7.select type 'Remote' hit save 8.connect to webui or rr99 9.navigate to settings-configuration-settings-region-replication 10.select type 'Global' 11.add subscription filling in details of rr1 12.click verify/save 13.make a backup of both appliances 14.stop evmserverd 15.restore backup to appliances Actual results:Restore fails using appliance_console due to there being connections to the database Expected results:restore completes successfully Additional info: dropdb: database removal failed: ERROR: database "vmdb_production" is used by a logical replication slot DETAIL: There is 1 slot, 1 of them active. You can run pg_restore without dropping the db but you get 1404 errors also evm will not start back up afterwards.
Luke, Can you list out exactly how you went about creating the backup and doing the restore? For example did you use pg_dump manually or did you use the rake task? Same with the restore, did you use the console? Also, it looks like some of this stuff might log to the appliance_console.log can you check for any additional output there? Thanks
So you can finish up the restore if you drop the subscription on the global region. That is the extra connection that is preventing the restore from moving forward. We can make that a check we do during the restore process (drop the subscription or warn about it). Another issue we will run into is that replication won't survive the backup/restore process. This is because pg_dump doesn't handle replication slots which is what pglogical relies on to work. We would need to change the format of the backup file to include cfme specific metadata about replication slots. That would be similar to the changes made to the upgrade script for this issue https://bugzilla.redhat.com/show_bug.cgi?id=1391693
This PR will allow the restore to go through as long as we are either restoring to a global database or there are no subscriptions to the remote database. We get around the connection problem by dropping subscriptions if they exist in global databases before executing the restore. To restore a remote database, the subscription will have to be dropped manually before attempting the restore. https://github.com/ManageIQ/manageiq-gems-pending/pull/23 Unfortunately, this does not solve the issue of replication not working after the restore, but hopefully this we be enough for this blocker bug as fixing replication for the restore will be a much larger effort.
New commit detected on ManageIQ/manageiq-gems-pending/master: https://github.com/ManageIQ/manageiq-gems-pending/commit/aca175864e5962fa804552840873d7918a2df2d8 commit aca175864e5962fa804552840873d7918a2df2d8 Author: Nick Carboni <ncarboni> AuthorDate: Fri Nov 18 17:10:30 2016 -0500 Commit: Nick Carboni <ncarboni> CommitDate: Fri Nov 18 17:10:30 2016 -0500 Don't test for connections before the restore The first part of the restore is to drop the database. This will fail if there are connections to the database. Failing before we call restore will also prevent us from fixing the situation by removing any pglogical subscriptions that may be present before dropping the database. Instead we can print an error after we fail to inform the user that the failure was because there were connections to the database. https://bugzilla.redhat.com/show_bug.cgi?id=1393049 lib/gems/pending/appliance_console.rb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
New commit detected on ManageIQ/manageiq-gems-pending/master: https://github.com/ManageIQ/manageiq-gems-pending/commit/d081653ba85a0752f102a7a9c18b5e85be803ea4 commit d081653ba85a0752f102a7a9c18b5e85be803ea4 Author: Nick Carboni <ncarboni> AuthorDate: Fri Nov 18 17:13:00 2016 -0500 Commit: Nick Carboni <ncarboni> CommitDate: Fri Nov 18 17:13:00 2016 -0500 Drop pglogical subscriptions before the restore pglogical subscriptions cause a postgresql worker process to hold open connections to the database. If we attempt to drop the database while we still have subscriptions, we will always fail. This commit removes any existing subscriptions before attempting a restore and ignores the failures in the case that pglogical is not being used. This allows the database restore to complete cleanly. https://bugzilla.redhat.com/show_bug.cgi?id=1393049 lib/gems/pending/util/postgres_admin.rb | 14 +++++++++++--- spec/util/postgres_admin_spec.rb | 7 +++++++ 2 files changed, 18 insertions(+), 3 deletions(-)
Verified in 5.8.0.3
New commit detected on ManageIQ/manageiq-appliance_console/master: https://github.com/ManageIQ/manageiq-appliance_console/commit/117862145b48c1f0ad9b3d52bcce129186654329 commit 117862145b48c1f0ad9b3d52bcce129186654329 Author: Nick Carboni <ncarboni> AuthorDate: Fri Nov 18 22:13:00 2016 +0000 Commit: Nick LaMuro <nicklamuro> CommitDate: Tue Mar 16 19:25:16 2021 +0000 Drop pglogical subscriptions before the restore pglogical subscriptions cause a postgresql worker process to hold open connections to the database. If we attempt to drop the database while we still have subscriptions, we will always fail. This commit removes any existing subscriptions before attempting a restore and ignores the failures in the case that pglogical is not being used. This allows the database restore to complete cleanly. https://bugzilla.redhat.com/show_bug.cgi?id=1393049 (transferred from ManageIQ/manageiq-gems-pending@d081653ba85a0752f102a7a9c18b5e85be803ea4) lib/manageiq/appliance_console/postgres_admin.rb | 14 +- spec/postgres_admin_spec.rb | 7 + 2 files changed, 18 insertions(+), 3 deletions(-)