Description of problem: After dropping a pglogical node, the server process acquires many shared access locks which prevent subsequent attempts to create replication slots (subscriptions). Version-Release number of selected component (if applicable): master-201605122000 How reproducible: Always Steps to Reproduce: 1. Deploy two appliances (region 0, and region 99) 2. Set the region 0 appliance to replication type "remote", then "none", then "remote" (create the pglogical node, remove it, recreate it) 3. Create a subscription to the region 0 appliance on the region 99 appliance Actual results: The initial sync never occurs. Expected results: Table data is synced from region 0 to region 99 Additional info: What is happening is that the node drop code on the region 0 database acquires shared locks for the replication slots and never releases them. The subscription create process then blocks trying to take an exclusive lock to create the replication slot for the new subscription. There is no good workaround that I have found so far. At the very least we would need to restart the postgres service on the regional database to get the locks released, but even that would leave the global process in a bad state. This is a bug in pglogical for which I have submitted a patch here https://github.com/2ndQuadrant/postgres/pull/3 Unfortunately PRs don't seem to be particularly welcome. Because we are already running a "custom" build of pglogical to be compatible with SCL postgresql (i.e. we are not dependent on their source), we could make the patch and just rebuild. I've done this locally on upstream appliances. The issue is we don't have a good way to track these patches or review changes.
Moved the PR to the pglogical repo rather than the postgres fork. https://github.com/2ndQuadrant/pglogical/pull/3
The changes were made in pglogical (https://github.com/2ndQuadrant/pglogical/commit/85052cb6e76f8a5caf2c9189729ecbc99485ef00) So we will either update the version we are using when those changes get into a release or rebuild our rpm with just that patch included if a new pglogical release does not happen in time.
The most recent release of pglogical (1.1.1) contains some changes that would require more extensive refactoring which is out of the scope of fixing this issue. Because of this we are going to include a patch in our pglogical build to solve this particular issue. This is the commit to add the patch and update the spec file to include this fix http://pkgs.devel.redhat.com/cgit/rpms/postgresql-pglogical/commit/?h=cfme-rh-postgresql94-5.6-rhel-7
Also built a new package for the upstream appliances here https://copr.fedorainfracloud.org/coprs/ncarboni/pglogical-SCL/build/288682/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1348