Bug 1324991 - Replication doesnt work after 5.4.x > 5.5.3.2 upgrade
Summary: Replication doesnt work after 5.4.x > 5.5.3.2 upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Replication
Version: 5.5.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: GA
: 5.5.4
Assignee: Nick Carboni
QA Contact: luke couzens
URL:
Whiteboard: upgrade:replication
Depends On: 1323951
Blocks: 1335968
TreeView+ depends on / blocked
 
Reported: 2016-04-07 19:04 UTC by Chris Pelland
Modified: 2019-10-10 11:48 UTC (History)
10 users (show)

Fixed In Version: 5.5.4.0
Doc Type: Bug Fix
Doc Text:
Previously, after upgrading CloudForms Management Engine from 5.4.x to 5.5.3.2, replication worked only for a few transactions but then stopped and was unable to continue. This was caused because ems_events table was renamed to event_streams in 4.0. The migration that did the rename did not remove the old trigger for ems_events. Due to this, the trigger stayed with the new event_streams table after the rename. In addition, the correct trigger was added for the new table name, as it should. This caused pending changes records to be created for both the old table and the new table names when a change occurred to event_streams. The existence of the pending change entries for the removed ems_events table caused the replication run to raise an internal error while trying to get table information. The error was not properly handled and that prevented a timeout from being reset. This caused the timeout error seen in the output and prevented the process from replicating any changes. With this release, a script to remove out of date rubyrep triggers was added and the issue is now resolved.
Clone Of: 1323951
: 1335968 (view as bug list)
Environment:
Last Closed: 2016-05-31 13:42:16 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1101 0 normal SHIPPED_LIVE CFME 5.5.4 bug fixes and enhancement update 2016-05-31 17:40:10 UTC

Comment 1 Nick Carboni 2016-04-08 12:28:50 UTC
This is still in progress

Comment 5 CFME Bot 2016-04-15 12:06:34 UTC
New commit detected on cfme/5.5.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=c478672dc298597d93411f00e1f849d949f3ba48

commit c478672dc298597d93411f00e1f849d949f3ba48
Merge: 0fdc9a3 12d5a75
Author:     Gregg Tanzillo <gtanzill>
AuthorDate: Fri Apr 15 08:06:07 2016 -0400
Commit:     Gregg Tanzillo <gtanzill>
CommitDate: Fri Apr 15 08:06:07 2016 -0400

    Merge branch 'backport_replication_trigger_script' into '5.5.z'
    
    Add a script to remove out of date rubyrep triggers
    
    When a table which is being replicated is renamed the rubyrep
    trigger should be dropped and recreated to ensure only rows
    referencing the new table are inserted into rr_pending_changes.
    
    If this is not done properly, triggers referencing the old table
    name will remain functional on the renamed table. This has been
    seen to cause the replicate process to time out when it happens with
    high-churn tables.
    
    Clean cherry-pick
    
    Upstream PR: https://github.com/ManageIQ/manageiq/pull/7834
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1324991
    
    @gtanzill @jrafanie
    
    See merge request !893

 tools/purge_duplicate_rubyrep_triggers.rb | 57 +++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

Comment 6 CFME Bot 2016-04-15 12:06:39 UTC
New commit detected on cfme/5.5.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=12d5a75f89c3780b32ccec4c3e18a89ddf5ee700

commit 12d5a75f89c3780b32ccec4c3e18a89ddf5ee700
Author:     Nick Carboni <ncarboni>
AuthorDate: Fri Apr 8 15:18:13 2016 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Wed Apr 13 08:24:13 2016 -0400

    Add a script to remove out of date rubyrep triggers
    
    When a table which is being replicated is renamed the rubyrep
    trigger should be dropped and recreated to ensure only rows
    referencing the new table are inserted into rr_pending_changes.
    
    If this is not done properly, triggers referencing the old table
    name will remain functional on the renamed table. This has been
    seen to cause the replicate process to time out when it happens with
    high-churn tables.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1324991

 tools/purge_duplicate_rubyrep_triggers.rb | 57 +++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)
 create mode 100644 tools/purge_duplicate_rubyrep_triggers.rb

Comment 12 luke couzens 2016-05-10 07:44:25 UTC
Gregg,

Both appliances are now up and running, however replication backlog is still growing.

Comment 13 Nick Carboni 2016-05-11 16:46:01 UTC
I was able to get the test environment up and running, after one day there is still no backlog.

I ran the following commands:
`systemctl stop evmserverd`
`bin/rake evm:dbsync:reset`
`systemctl start evmserverd`

After reviewing the 5.4 -> 5.5 migration doc (https://access.redhat.com/articles/2076193) I would suggest a few changes to be sure that there is no backlog when the database is exported from the 5.4 machine.

In the section "Steps Performed on the 5.4 Appliances" under the heading "The remaining steps should be completed only on the VMDB Appliances.":

Step 3 should read "Press Alt+F3 to begin a terminal session, and log in with a user that has root credentials or similar."

Then we should add two additional steps before starting :
4. Shut down the evmserver process (`systemctl stop evmserverd`)
5. Ensure all pending changes have been replicated (`bin/rake evm:dbsync:replicate_backlog`)

Then continue the guide with:
"Find the size of the VMDB by using the following command. Then, ensure that your volume has enough space for the backup file."

What do you think Gregg?

Comment 14 Gregg Tanzillo 2016-05-11 18:32:20 UTC
Thanks for the info, Nick. I'm in agreement with updating the migration doc with those additional steps.

Comment 15 Nick Carboni 2016-05-12 12:49:49 UTC
Made the edit on the docs. I think it needs to be reviewed and published by Marianne.  When that is done I think we can consider this fixed.

Comment 16 Marianne Feifer 2016-05-13 14:55:57 UTC
If migration doc needs updating, please open a second ticket under documentation component and assign to me.It sounds like things are fixed on development side, just need the updates to the migration guide.

Comment 19 luke couzens 2016-05-19 16:26:20 UTC
Verified

Comment 20 luke couzens 2016-05-19 16:27:18 UTC
5.5.4.0

Comment 22 errata-xmlrpc 2016-05-31 13:42:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1101


Note You need to log in before you can comment on or make changes to this bug.