Bug 1560602
Summary: | Database Replication broken for current and new regions | |||
---|---|---|---|---|
Product: | Red Hat CloudForms Management Engine | Reporter: | Ryan Spagnola <rspagnol> | |
Component: | Appliance | Assignee: | Nick Carboni <ncarboni> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Tasos Papaioannou <tpapaioa> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 5.8.0 | CC: | abellott, cpelland, obarenbo | |
Target Milestone: | GA | Keywords: | TestOnly, ZStream | |
Target Release: | 5.10.0 | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | 5.10.0.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1562791 1562792 (view as bug list) | Environment: | ||
Last Closed: | 2019-02-11 14:04:08 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1562791, 1562792 |
Description
Ryan Spagnola
2018-03-26 14:40:04 UTC
This issue was caused by a custom postgresql.conf change done in the customer's remote region. The change in question is altering random_page_cost from the default of 4 to 1.1. I have confirmed that this change will consistently cause the reported issue in a new 5.8.2.3 deployment and that reverting this change in the remote region and re-adding the subscription will fix the problem. As this is not an issue with the default configuration of the database, I would propose this not be a blocker bug. In particular this manifested by the subscription being added, and the only row in the pglogical.local_sync_status for that subscription being a row without a sync_relname value, but the sync_status showing 'r' (for ready). This should only be the case when all tables in the replication set for that subscription were successfully synced. This will also manifest as conflict errors in the global postgres logs (typically on the miq_servers table) because the row that the replication apply process is trying to update has not been replicated in the initial sync. Backlog will not be increasing even though changes are not being successfully replicated. This issue is not reproducible in 5.9. Between 5.8 and 5.9 we moved from pglogical version 1.2.1 to version 2.1.0; I suspect this upgrade fixed this issue. Moving this to POST and will discuss backporting in the cloned BZ. Verified on 5.10.0.3. |