Description of problem: Right now, reintroducing a failed primary database node in an HA architecture is a painfully manual process that is prone to issues and doesn't always work properly if WAL archiving isn't configured (https://bugzilla.redhat.com/show_bug.cgi?id=1406815) Version-Release number of selected component (if applicable): 5.7.0.17 It would be good if we offered a separate console option (or enhance the existing standby setup one) that would recreate the database on an appliance with a new base backup from the primary. This would amount to removing the contents of the data directory and running through the same steps to configure a standby node. This will need a big warning to say that it is a destructive operation and all the data currently stored in the local database will be lost. After seeing the issues with pg_rewind I would rather see this as the "right" way to reintroduce a node.
https://github.com/ManageIQ/manageiq-gems-pending/pull/124
https://github.com/ManageIQ/manageiq-gems-pending/pull/126
New commit detected on ManageIQ/manageiq-gems-pending/master: https://github.com/ManageIQ/manageiq-gems-pending/commit/63a179ea2b419007df07a0385989f8f20978ee8f commit 63a179ea2b419007df07a0385989f8f20978ee8f Author: Nick Carboni <ncarboni> AuthorDate: Wed Apr 19 17:43:17 2017 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Mon Apr 24 15:52:53 2017 -0400 Offer to clear the data directory for new standby servers This will allow seamless reintegration of failed primary servers after a failover. When this happens the user will be given the option to clear the existing database and re-clone the new primary into this server and then continue to set up a standby as before. https://bugzilla.redhat.com/show_bug.cgi?id=1426718 https://bugzilla.redhat.com/show_bug.cgi?id=1426769 https://bugzilla.redhat.com/show_bug.cgi?id=1442911 .../database_replication_standby.rb | 20 +-- .../database_replication_standby_spec.rb | 143 ++++++++++++++------- 2 files changed, 112 insertions(+), 51 deletions(-)
Changed the console option for standby setup to also allow re-initializing failed primary servers. This is much simpler and less error-prone than using pg_rewind and other manual cli commands.
Verified in 5.9.0.2