Bug 1387420 - PG Logical Error: replication set mapping -1931769956:25998 not found (PG::InternalError)
Summary: PG Logical Error: replication set mapping -1931769956:25998 not found (PG::In...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: 5.6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.8.0
Assignee: Nick Carboni
QA Contact: Alex Newman
URL:
Whiteboard: replication
Depends On:
Blocks: 1391713 1404830
TreeView+ depends on / blocked
 
Reported: 2016-10-20 20:21 UTC by Adam Grare
Modified: 2018-01-19 21:20 UTC (History)
5 users (show)

Fixed In Version: 5.8.0.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1391713 1404830 (view as bug list)
Environment:
Last Closed: 2017-06-12 16:29:36 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Adam Grare 2016-10-20 20:21:31 UTC
Description of problem:
Errors from pg_logical on worker initialization when syncing config:
/var/www/miq/vmdb/lib/extensions/ar_adapter/ar_pglogical/pglogical_raw.rb:301:in `async_exec': ERROR:  replication set mapping -1931769956:25998 not found (PG::InternalError)
        from /var/www/miq/vmdb/lib/extensions/ar_adapter/ar_pglogical/pglogical_raw.rb:301:in `typed_exec'
        from /var/www/miq/vmdb/lib/extensions/ar_adapter/ar_pglogical/pglogical_raw.rb:281:in `replication_set_remove_table'
        from /var/www/miq/vmdb/lib/miq_pglogical.rb:79:in `block in refresh_excludes'
        from /var/www/miq/vmdb/lib/miq_pglogical.rb:78:in `each' 
        from /var/www/miq/vmdb/lib/miq_pglogical.rb:78:in `refresh_excludes'
        from /var/www/miq/vmdb/lib/vmdb/config/activator.rb:50:in `workers'
        from /var/www/miq/vmdb/lib/vmdb/config/activator.rb:24:in `block in activate'
        from /var/www/miq/vmdb/lib/vmdb/config/activator.rb:20:in `each_key'
        from /var/www/miq/vmdb/lib/vmdb/config/activator.rb:20:in `activate'
        from /var/www/miq/vmdb/lib/vmdb/settings.rb:42:in `activate'
        from /var/www/miq/vmdb/lib/vmdb/settings.rb:20:in `reload!'
        from /var/www/miq/vmdb/app/models/miq_worker/runner.rb:283:in `sync_config'
        from /var/www/miq/vmdb/app/models/miq_worker/runner.rb:63:in `worker_initialization'
        from /var/www/miq/vmdb/app/models/miq_worker/runner.rb:51:in `initialize'
        from /var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `new'
        from /var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `start_worker'
        from /var/www/miq/vmdb/app/models/miq_worker.rb:347:in `block in start' 


Version-Release number of selected component (if applicable):
5.6.1.2

Comment 4 Nick Carboni 2016-10-28 18:35:51 UTC
This looks like it is being caused by a race condition in MiqPglogical#refresh_excludes

I was able to reproduce this by:
- Creating two appliances pointing to the same database
- Set replication type to "remote" on either appliance
- Remove a table from the excludes on one server (I used log_files)
- Wait for the config to reload on the workers (it may not happen every time)

By taking advantage of a separate issue which causes the replication set to "thrash" when two servers have different excludes each worker that reloads the config will likely have to change the excludes.

It is possible for one worker to read the current replication set, have a different worker remove the table and then have the table be missing when the first one also tries to remove it from the set.

I will look into process synchronization methods to solve this.

Comment 5 Nick Carboni 2016-10-28 18:44:51 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1389821 - Issue referenced in comment 4

Comment 7 CFME Bot 2016-11-01 17:41:08 UTC
New commit detected on ManageIQ/manageiq/euwe:
https://github.com/ManageIQ/manageiq/commit/b89c5ab912994cc26b8aee49546b6b6bd6ef52ee

commit b89c5ab912994cc26b8aee49546b6b6bd6ef52ee
Author:     Jason Frey <fryguy9>
AuthorDate: Mon Oct 31 16:36:51 2016 -0400
Commit:     Oleg Barenboim <chessbyte>
CommitDate: Tue Nov 1 13:37:50 2016 -0400

    Merge pull request #12280 from carbonin/add_logging_to_refresh_excludes
    
    Add logging when the replication set is altered
    (cherry picked from commit 49cfe5798b581595dc91d4d00a9038000c2fd24a)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1387420
    https://bugzilla.redhat.com/show_bug.cgi?id=1386686

 lib/miq_pglogical.rb | 2 ++
 1 file changed, 2 insertions(+)

Comment 8 CFME Bot 2016-11-03 14:31:41 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/320058db5c63a03ebb11e354fb34b983a41ac657

commit 320058db5c63a03ebb11e354fb34b983a41ac657
Author:     Nick Carboni <ncarboni>
AuthorDate: Mon Oct 31 17:02:26 2016 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Mon Oct 31 18:02:23 2016 -0400

    Wrap the contents of MiqPglogical#refresh_excludes in a lock
    
    This will prevent multiple processes from executing this logic
    at the same time which will prevent us from adding a table that
    is already included in the set or removing a table that is not in
    the set.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1387420

 lib/miq_pglogical.rb | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

Comment 10 CFME Bot 2016-12-14 03:21:29 UTC
New commit detected on ManageIQ/manageiq/darga:
https://github.com/ManageIQ/manageiq/commit/7b4b3d961f18a847b9c395e65c0f2575ad108c1a

commit 7b4b3d961f18a847b9c395e65c0f2575ad108c1a
Author:     Nick Carboni <ncarboni>
AuthorDate: Mon Oct 31 17:02:26 2016 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Tue Nov 8 14:57:08 2016 -0500

    Wrap the contents of MiqPglogical#refresh_excludes in a lock
    
    This will prevent multiple processes from executing this logic
    at the same time which will prevent us from adding a table that
    is already included in the set or removing a table that is not in
    the set.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1387420

 lib/miq_pglogical.rb | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)


Note You need to log in before you can comment on or make changes to this bug.