Bug 1341291 - 5.6 ReplicationWorker Recycling due to rubyrep: unknown OID 0: failed to recognize type of 'change_table'. It will be treated as String.
Summary: 5.6 ReplicationWorker Recycling due to rubyrep: unknown OID 0: failed to reco...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Replication
Version: 5.6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.6.0
Assignee: Nick Carboni
QA Contact: Alex Newman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-31 17:36 UTC by Alex Krzos
Modified: 2019-12-16 05:52 UTC (History)
8 users (show)

Fixed In Version: 5.6.0.10
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-29 16:07:04 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1348 0 normal SHIPPED_LIVE CFME 5.6.0 bug fixes and enhancement update 2016-06-29 18:50:04 UTC

Description Alex Krzos 2016-05-31 17:36:41 UTC
Description of problem:
I configured and turned on replication, the replication worker is recycled frequently and dumps an error into the log file. 

Version-Release number of selected component (if applicable):
5.6.0.8

How reproducible:
Both times I configured and turned on replication for two different providers

Steps to Reproduce:
1. Configure replication
2. Turn on Database Synchronization role
3. Witness ReplicationWorker recycling and view logs for error

Actual results:
Worker recycles frequently

Expected results:
Worker to stay alive

Additional info:
I was trying to see if this worker exceeds its memory threshold but ran into this issue instead.  On the replication master I can see my Inventory so at least some tables appear to be replicated and thus the end user may never even know about this issue if they don't observe the recycling of the worker or errors in the log file.

This is using the older replication method (RubyRep) rather than the newer pglogical replication.

Relevant log lines:

[----] E, [2016-05-29T20:50:46.366304 #28078:c57998] ERROR -- : rubyrep: unknown OID 0: failed to recognize type of 'change_table'. It will be treated as String.
[----] E, [2016-05-29T20:50:46.366356 #28078:c57998] ERROR -- : rubyrep: unknown OID 0: failed to recognize type of 'id'. It will be treated as String.

Comment 4 Nick Carboni 2016-05-31 19:00:06 UTC
Was this database migrated from 5.5 or was it a fresh deploy on version 5.6?

Comment 6 Nick Carboni 2016-05-31 20:08:13 UTC
Looks like the worker was hitting the memory threshold and being shut down by the monitor

Comment 7 Alex Krzos 2016-05-31 20:14:41 UTC
To build on what Nick added.  In idle memory test this worker rides just under his threshold at ~190ish PSS Memory.  

Under workload I am seeing the worker grow to ~270ish  MiB PSS before worker management kicks in and recycles the worker

# smem -c 'pid rss pss  command' | grep "[Rr]eplication" -i
 1725   355040   277714 MIQ: MiqReplicationWorker id: 84

Comment 8 Nick Carboni 2016-06-01 13:06:28 UTC
So for this, are we looking to increase the default memory threshold?

Comment 9 Nick Carboni 2016-06-01 16:03:06 UTC
https://github.com/ManageIQ/manageiq/pull/9087

Comment 10 CFME Bot 2016-06-01 19:20:58 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/77481589f714d410e497b7914c91e1cf3cc43285

commit 77481589f714d410e497b7914c91e1cf3cc43285
Author:     Nick Carboni <ncarboni>
AuthorDate: Wed Jun 1 11:44:47 2016 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Wed Jun 1 11:44:47 2016 -0400

    Increase replication worker's memory threshold
    
    Previously it was 200 megabytes which was causing the monitor
    to bring the worker down during normal operation
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1341291

 config/settings.yml | 1 +
 1 file changed, 1 insertion(+)

Comment 14 errata-xmlrpc 2016-06-29 16:07:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1348


Note You need to log in before you can comment on or make changes to this bug.