Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1341291 - 5.6 ReplicationWorker Recycling due to rubyrep: unknown OID 0: failed to recognize type of 'change_table'. It will be treated as String.
5.6 ReplicationWorker Recycling due to rubyrep: unknown OID 0: failed to reco...
Status: CLOSED ERRATA
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Replication (Show other bugs)
5.6.0
Unspecified Unspecified
high Severity high
: GA
: 5.6.0
Assigned To: Nick Carboni
Alex Newman
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-05-31 13:36 EDT by Alex Krzos
Modified: 2018-01-19 16:25 EST (History)
8 users (show)

See Also:
Fixed In Version: 5.6.0.10
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-06-29 12:07:04 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1348 normal SHIPPED_LIVE CFME 5.6.0 bug fixes and enhancement update 2016-06-29 14:50:04 EDT

  None (edit)
Description Alex Krzos 2016-05-31 13:36:41 EDT
Description of problem:
I configured and turned on replication, the replication worker is recycled frequently and dumps an error into the log file. 

Version-Release number of selected component (if applicable):
5.6.0.8

How reproducible:
Both times I configured and turned on replication for two different providers

Steps to Reproduce:
1. Configure replication
2. Turn on Database Synchronization role
3. Witness ReplicationWorker recycling and view logs for error

Actual results:
Worker recycles frequently

Expected results:
Worker to stay alive

Additional info:
I was trying to see if this worker exceeds its memory threshold but ran into this issue instead.  On the replication master I can see my Inventory so at least some tables appear to be replicated and thus the end user may never even know about this issue if they don't observe the recycling of the worker or errors in the log file.

This is using the older replication method (RubyRep) rather than the newer pglogical replication.

Relevant log lines:

[----] E, [2016-05-29T20:50:46.366304 #28078:c57998] ERROR -- : rubyrep: unknown OID 0: failed to recognize type of 'change_table'. It will be treated as String.
[----] E, [2016-05-29T20:50:46.366356 #28078:c57998] ERROR -- : rubyrep: unknown OID 0: failed to recognize type of 'id'. It will be treated as String.
Comment 4 Nick Carboni 2016-05-31 15:00:06 EDT
Was this database migrated from 5.5 or was it a fresh deploy on version 5.6?
Comment 6 Nick Carboni 2016-05-31 16:08:13 EDT
Looks like the worker was hitting the memory threshold and being shut down by the monitor
Comment 7 Alex Krzos 2016-05-31 16:14:41 EDT
To build on what Nick added.  In idle memory test this worker rides just under his threshold at ~190ish PSS Memory.  

Under workload I am seeing the worker grow to ~270ish  MiB PSS before worker management kicks in and recycles the worker

# smem -c 'pid rss pss  command' | grep "[Rr]eplication" -i
 1725   355040   277714 MIQ: MiqReplicationWorker id: 84
Comment 8 Nick Carboni 2016-06-01 09:06:28 EDT
So for this, are we looking to increase the default memory threshold?
Comment 9 Nick Carboni 2016-06-01 12:03:06 EDT
https://github.com/ManageIQ/manageiq/pull/9087
Comment 10 CFME Bot 2016-06-01 15:20:58 EDT
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/77481589f714d410e497b7914c91e1cf3cc43285

commit 77481589f714d410e497b7914c91e1cf3cc43285
Author:     Nick Carboni <ncarboni@redhat.com>
AuthorDate: Wed Jun 1 11:44:47 2016 -0400
Commit:     Nick Carboni <ncarboni@redhat.com>
CommitDate: Wed Jun 1 11:44:47 2016 -0400

    Increase replication worker's memory threshold
    
    Previously it was 200 megabytes which was causing the monitor
    to bring the worker down during normal operation
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1341291

 config/settings.yml | 1 +
 1 file changed, 1 insertion(+)
Comment 14 errata-xmlrpc 2016-06-29 12:07:04 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1348

Note You need to log in before you can comment on or make changes to this bug.