1341291 – 5.6 ReplicationWorker Recycling due to rubyrep: unknown OID 0: failed to recognize type of 'change_table'. It will be treated as String.

Bug 1341291 - 5.6 ReplicationWorker Recycling due to rubyrep: unknown OID 0: failed to recognize type of 'change_table'. It will be treated as String.

Summary: 5.6 ReplicationWorker Recycling due to rubyrep: unknown OID 0: failed to reco...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Replication
Sub Component:
Version:	5.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	GA
Target Release:	5.6.0
Assignee:	Nick Carboni
QA Contact:	Alex Newman
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-31 17:36 UTC by Alex Krzos
Modified:	2019-12-16 05:52 UTC (History)
CC List:	8 users (show)
Fixed In Version:	5.6.0.10
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-29 16:07:04 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1348	0	normal	SHIPPED_LIVE	CFME 5.6.0 bug fixes and enhancement update	2016-06-29 18:50:04 UTC

Description Alex Krzos 2016-05-31 17:36:41 UTC

Description of problem:
I configured and turned on replication, the replication worker is recycled frequently and dumps an error into the log file. 

Version-Release number of selected component (if applicable):
5.6.0.8

How reproducible:
Both times I configured and turned on replication for two different providers

Steps to Reproduce:
1. Configure replication
2. Turn on Database Synchronization role
3. Witness ReplicationWorker recycling and view logs for error

Actual results:
Worker recycles frequently

Expected results:
Worker to stay alive

Additional info:
I was trying to see if this worker exceeds its memory threshold but ran into this issue instead.  On the replication master I can see my Inventory so at least some tables appear to be replicated and thus the end user may never even know about this issue if they don't observe the recycling of the worker or errors in the log file.

This is using the older replication method (RubyRep) rather than the newer pglogical replication.

Relevant log lines:

[----] E, [2016-05-29T20:50:46.366304 #28078:c57998] ERROR -- : rubyrep: unknown OID 0: failed to recognize type of 'change_table'. It will be treated as String.
[----] E, [2016-05-29T20:50:46.366356 #28078:c57998] ERROR -- : rubyrep: unknown OID 0: failed to recognize type of 'id'. It will be treated as String.

Comment 4 Nick Carboni 2016-05-31 19:00:06 UTC

Was this database migrated from 5.5 or was it a fresh deploy on version 5.6?

Comment 6 Nick Carboni 2016-05-31 20:08:13 UTC

Looks like the worker was hitting the memory threshold and being shut down by the monitor

Comment 7 Alex Krzos 2016-05-31 20:14:41 UTC

To build on what Nick added.  In idle memory test this worker rides just under his threshold at ~190ish PSS Memory.  

Under workload I am seeing the worker grow to ~270ish  MiB PSS before worker management kicks in and recycles the worker

# smem -c 'pid rss pss  command' | grep "[Rr]eplication" -i
 1725   355040   277714 MIQ: MiqReplicationWorker id: 84

Comment 8 Nick Carboni 2016-06-01 13:06:28 UTC

So for this, are we looking to increase the default memory threshold?

Comment 9 Nick Carboni 2016-06-01 16:03:06 UTC

https://github.com/ManageIQ/manageiq/pull/9087

Comment 10 CFME Bot 2016-06-01 19:20:58 UTC

New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/77481589f714d410e497b7914c91e1cf3cc43285

commit 77481589f714d410e497b7914c91e1cf3cc43285
Author:     Nick Carboni <ncarboni>
AuthorDate: Wed Jun 1 11:44:47 2016 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Wed Jun 1 11:44:47 2016 -0400

    Increase replication worker's memory threshold
    
    Previously it was 200 megabytes which was causing the monitor
    to bring the worker down during normal operation
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1341291

 config/settings.yml | 1 +
 1 file changed, 1 insertion(+)

Comment 12 CFME Bot 2016-06-15 12:48:23 UTC

https://github.com/ManageIQ/manageiq/pull/9133

Comment 14 errata-xmlrpc 2016-06-29 16:07:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1348

Note You need to log in before you can comment on or make changes to this bug.