Description of problem: I configured and turned on replication, the replication worker is recycled frequently and dumps an error into the log file. Version-Release number of selected component (if applicable): 5.6.0.8 How reproducible: Both times I configured and turned on replication for two different providers Steps to Reproduce: 1. Configure replication 2. Turn on Database Synchronization role 3. Witness ReplicationWorker recycling and view logs for error Actual results: Worker recycles frequently Expected results: Worker to stay alive Additional info: I was trying to see if this worker exceeds its memory threshold but ran into this issue instead. On the replication master I can see my Inventory so at least some tables appear to be replicated and thus the end user may never even know about this issue if they don't observe the recycling of the worker or errors in the log file. This is using the older replication method (RubyRep) rather than the newer pglogical replication. Relevant log lines: [----] E, [2016-05-29T20:50:46.366304 #28078:c57998] ERROR -- : rubyrep: unknown OID 0: failed to recognize type of 'change_table'. It will be treated as String. [----] E, [2016-05-29T20:50:46.366356 #28078:c57998] ERROR -- : rubyrep: unknown OID 0: failed to recognize type of 'id'. It will be treated as String.
Was this database migrated from 5.5 or was it a fresh deploy on version 5.6?
Looks like the worker was hitting the memory threshold and being shut down by the monitor
To build on what Nick added. In idle memory test this worker rides just under his threshold at ~190ish PSS Memory. Under workload I am seeing the worker grow to ~270ish MiB PSS before worker management kicks in and recycles the worker # smem -c 'pid rss pss command' | grep "[Rr]eplication" -i 1725 355040 277714 MIQ: MiqReplicationWorker id: 84
So for this, are we looking to increase the default memory threshold?
https://github.com/ManageIQ/manageiq/pull/9087
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/77481589f714d410e497b7914c91e1cf3cc43285 commit 77481589f714d410e497b7914c91e1cf3cc43285 Author: Nick Carboni <ncarboni> AuthorDate: Wed Jun 1 11:44:47 2016 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Wed Jun 1 11:44:47 2016 -0400 Increase replication worker's memory threshold Previously it was 200 megabytes which was causing the monitor to bring the worker down during normal operation https://bugzilla.redhat.com/show_bug.cgi?id=1341291 config/settings.yml | 1 + 1 file changed, 1 insertion(+)
https://github.com/ManageIQ/manageiq/pull/9133
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1348