| Summary: | Replication doesnt work after 5.4.x > 5.5.3.2 upgrade | |||
|---|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Chris Pelland <cpelland> | |
| Component: | Replication | Assignee: | Nick Carboni <ncarboni> | |
| Status: | CLOSED ERRATA | QA Contact: | luke couzens <lcouzens> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 5.5.0 | CC: | cpelland, gtanzill, jhardy, jkrocil, jprause, lcouzens, mfeifer, ncarboni, obarenbo, ssainkar | |
| Target Milestone: | GA | Keywords: | ZStream | |
| Target Release: | 5.5.4 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | upgrade:replication | |||
| Fixed In Version: | 5.5.4.0 | Doc Type: | Bug Fix | |
| Doc Text: |
Previously, after upgrading CloudForms Management Engine from 5.4.x to 5.5.3.2, replication worked only for a few transactions but then stopped and was unable to continue.
This was caused because ems_events table was renamed to event_streams in 4.0. The migration that did the rename did not remove the old trigger for ems_events. Due to this, the trigger stayed with the new event_streams table after the rename. In addition, the correct trigger was added for the new table name, as it should. This caused pending changes records to be created for both the old table and the new table names when a change occurred to event_streams.
The existence of the pending change entries for the removed ems_events table caused the replication run to raise an internal error while trying to get table information. The error was not properly handled and that prevented a timeout from being reset. This caused the timeout error seen in the output and prevented the process from replicating any changes.
With this release, a script to remove out of date rubyrep triggers was added and the issue is now resolved.
|
Story Points: | --- | |
| Clone Of: | 1323951 | |||
| : | 1335968 (view as bug list) | Environment: | ||
| Last Closed: | 2016-05-31 13:42:16 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | 1323951 | |||
| Bug Blocks: | 1335968 | |||
|
Comment 1
Nick Carboni
2016-04-08 12:28:50 UTC
New commit detected on cfme/5.5.z: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=c478672dc298597d93411f00e1f849d949f3ba48 commit c478672dc298597d93411f00e1f849d949f3ba48 Merge: 0fdc9a3 12d5a75 Author: Gregg Tanzillo <gtanzill> AuthorDate: Fri Apr 15 08:06:07 2016 -0400 Commit: Gregg Tanzillo <gtanzill> CommitDate: Fri Apr 15 08:06:07 2016 -0400 Merge branch 'backport_replication_trigger_script' into '5.5.z' Add a script to remove out of date rubyrep triggers When a table which is being replicated is renamed the rubyrep trigger should be dropped and recreated to ensure only rows referencing the new table are inserted into rr_pending_changes. If this is not done properly, triggers referencing the old table name will remain functional on the renamed table. This has been seen to cause the replicate process to time out when it happens with high-churn tables. Clean cherry-pick Upstream PR: https://github.com/ManageIQ/manageiq/pull/7834 https://bugzilla.redhat.com/show_bug.cgi?id=1324991 @gtanzill @jrafanie See merge request !893 tools/purge_duplicate_rubyrep_triggers.rb | 57 +++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) New commit detected on cfme/5.5.z: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=12d5a75f89c3780b32ccec4c3e18a89ddf5ee700 commit 12d5a75f89c3780b32ccec4c3e18a89ddf5ee700 Author: Nick Carboni <ncarboni> AuthorDate: Fri Apr 8 15:18:13 2016 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Wed Apr 13 08:24:13 2016 -0400 Add a script to remove out of date rubyrep triggers When a table which is being replicated is renamed the rubyrep trigger should be dropped and recreated to ensure only rows referencing the new table are inserted into rr_pending_changes. If this is not done properly, triggers referencing the old table name will remain functional on the renamed table. This has been seen to cause the replicate process to time out when it happens with high-churn tables. https://bugzilla.redhat.com/show_bug.cgi?id=1324991 tools/purge_duplicate_rubyrep_triggers.rb | 57 +++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) create mode 100644 tools/purge_duplicate_rubyrep_triggers.rb Gregg, Both appliances are now up and running, however replication backlog is still growing. I was able to get the test environment up and running, after one day there is still no backlog. I ran the following commands: `systemctl stop evmserverd` `bin/rake evm:dbsync:reset` `systemctl start evmserverd` After reviewing the 5.4 -> 5.5 migration doc (https://access.redhat.com/articles/2076193) I would suggest a few changes to be sure that there is no backlog when the database is exported from the 5.4 machine. In the section "Steps Performed on the 5.4 Appliances" under the heading "The remaining steps should be completed only on the VMDB Appliances.": Step 3 should read "Press Alt+F3 to begin a terminal session, and log in with a user that has root credentials or similar." Then we should add two additional steps before starting : 4. Shut down the evmserver process (`systemctl stop evmserverd`) 5. Ensure all pending changes have been replicated (`bin/rake evm:dbsync:replicate_backlog`) Then continue the guide with: "Find the size of the VMDB by using the following command. Then, ensure that your volume has enough space for the backup file." What do you think Gregg? Thanks for the info, Nick. I'm in agreement with updating the migration doc with those additional steps. Made the edit on the docs. I think it needs to be reviewed and published by Marianne. When that is done I think we can consider this fixed. If migration doc needs updating, please open a second ticket under documentation component and assign to me.It sounds like things are fixed on development side, just need the updates to the migration guide. Verified 5.5.4.0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1101 |