Bug 1381346

Summary: Request for review of replication architecture using pglocical in an HA VMDB environment
Product: Red Hat CloudForms Management Engine Reporter: Thomas Hennessy <thenness>
Component: ReplicationAssignee: Gregg Tanzillo <gtanzill>
Status: CLOSED NOTABUG QA Contact: Dave Johnson <dajohnso>
Severity: high Docs Contact:
Priority: high    
Version: 5.6.0CC: benglish, bthurber, cpelland, jdeubel, jhardy, jocarter, myoder, ncarboni, obarenbo, thenness
Target Milestone: GA   
Target Release: cfme-future   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-18 16:50:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: CFME Core Target Upstream Version:
Attachments:
Description Flags
architecture document provided by customer none

Description Thomas Hennessy 2016-10-03 19:06:14 UTC
Created attachment 1206960 [details]
architecture document provided by customer

Description of problem:Customer requests engineering review their replication architecture using pglogical in their proposed HA VMDB environment.  Architecture document submitted by customer is provided.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Thomas Hennessy 2016-10-03 19:10:06 UTC
Full Text of customer request follows:
======
Request that a complete review of an HA VMDB Configuration is done with Engineering to bless the configuraiton.  Currently, Verizon has an HA VMDB setup in a Production-Like Environment (PLE) that closely follows the Reference Architecture (attached).  The biggest concern is, in regards to the HA VMDB Reference Architecture: Are we accounting for pglogical in 4.1 in our reference architecture?  Is there anything we are missing?  If so, what is missing?  Looking for engineering buyoff on the current configuration before deploying to Production.

Since this is not a traditional case, please ask what logs/configurations are needed to be provided, and we can provide as needed.
===========

Comment 3 Nick Carboni 2016-10-03 20:48:19 UTC
This will likely be related closely to https://www.pivotaltracker.com/story/show/127384493

Comment 4 Gregg Tanzillo 2016-10-10 19:55:11 UTC
There are two HA failover cases where pglogical could be affected -1. When a regional database fails over and, 2. When the global database fails over.
For the second case, there should be no interruption to pglogical replication. The first case, however, would cause replication from the affected region to cease as there are no mechanisms in the HA solution to allow for pglogical replication to continue without interruption. In this case, manual remediation would be required.

In case 1, after a remote region fails over, replication for the remaining regions would continue and be unaffected. However, replication for the failed region would need to be reconfigured as follows -

- On the global region, the subscription would need to be removed.
- On the remote region, the replication type would need to be set from “none” to “remote”. This will create a new replication slot for the remote region in the new database.
- On the global region, a subscription would need to be added for the failed remote region.

This would initiate a full synchronization of all the remote data to the global database.

In case 2, after the global region fails over, replication should not be affected because the subscription information is contained in the global database and would failover. No user intervention is required for this case.

We (engineering) would need to work with a field person who is an SME in the the reference HA solution to identify options available for building the manual steps detailed above into the solution. In addition to that it would be necessary to have an environment where HA has been configured and running with pglogical replication where a solution can be developed, tested and documented.

Comment 5 Thomas Hennessy 2016-10-18 15:40:36 UTC
Gregg,
request/response was passed along to the Red Hat person who asked the question and made the BZ request on Oct 10.  As I write this reply it is now Oct 18 and if you have not heard from him then I guess he only wants the question answered and chooses not to be a part of creating the answer.

Tom Hennessy

Comment 7 Thomas Hennessy 2017-03-10 03:28:53 UTC
Gregg,
I have no objection to the BZ being closed.  I am forwrding your question to Jared Deubel as he also has a case connected to this BZ.

Comment 8 Nick Carboni 2017-04-18 16:50:00 UTC
Closing this as the customer implementation has been verified and the general issue is being tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1391095