Bug 1560602 - Database Replication broken for current and new regions
Summary: Database Replication broken for current and new regions
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: 5.8.0
Hardware: All
OS: All
high
high
Target Milestone: GA
: 5.10.0
Assignee: Nick Carboni
QA Contact: Tasos Papaioannou
URL:
Whiteboard:
Depends On:
Blocks: 1562791 1562792
TreeView+ depends on / blocked
 
Reported: 2018-03-26 14:40 UTC by Ryan Spagnola
Modified: 2021-06-10 15:31 UTC (History)
3 users (show)

Fixed In Version: 5.10.0.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1562791 1562792 (view as bug list)
Environment:
Last Closed: 2019-02-11 14:04:08 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ryan Spagnola 2018-03-26 14:40:04 UTC
Description of problem:
Database replication not working upon adding a new subscription. Connection validates successfully but upon saving says "error requesting data from server". Subscription is still present in the UI but the postgresql.log state conflict errors and nothing seems to be replicating. Additionally the newly added subscription/region is not present in the miq_regions table.


Version-Release number of selected component (if applicable):
5.8.3.4

How reproducible:
Consistently 

Steps to Reproduce:
1. add a region to a global in the UI


Actual results:
UI shows region is added but the db tables do not show a record of the new region

Expected results:
Region adds to ui and db

Additional info:

Comment 2 Nick Carboni 2018-03-28 15:50:43 UTC
This issue was caused by a custom postgresql.conf change done in the customer's remote region.

The change in question is altering random_page_cost from the default of 4 to 1.1.

I have confirmed that this change will consistently cause the reported issue in a new 5.8.2.3 deployment and that reverting this change in the remote region and re-adding the subscription will fix the problem.

As this is not an issue with the default configuration of the database, I would propose this not be a blocker bug.

Comment 3 Nick Carboni 2018-03-28 15:55:52 UTC
In particular this manifested by the subscription being added, and the only row in the pglogical.local_sync_status for that subscription being a row without a sync_relname value, but the sync_status showing 'r' (for ready).

This should only be the case when all tables in the replication set for that subscription were successfully synced.

This will also manifest as conflict errors in the global postgres logs (typically on the miq_servers table) because the row that the replication apply process is trying to update has not been replicated in the initial sync.

Backlog will not be increasing even though changes are not being successfully replicated.

Comment 4 Nick Carboni 2018-04-02 13:59:39 UTC
This issue is not reproducible in 5.9.

Between 5.8 and 5.9 we moved from pglogical version 1.2.1 to version 2.1.0; I suspect this upgrade fixed this issue.

Moving this to POST and will discuss backporting in the cloned BZ.

Comment 7 Tasos Papaioannou 2018-07-10 20:04:27 UTC
Verified on 5.10.0.3.


Note You need to log in before you can comment on or make changes to this bug.