Bug 1361218 - RubyRep fails to start after 5.5 -> 5.6 migration
Summary: RubyRep fails to start after 5.5 -> 5.6 migration
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: 5.6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.7.0
Assignee: Nick Carboni
QA Contact: luke couzens
URL:
Whiteboard: black:upgrade:migration:replication
Depends On:
Blocks: 1361610
TreeView+ depends on / blocked
 
Reported: 2016-07-28 14:11 UTC by luke couzens
Modified: 2017-01-12 04:53 UTC (History)
7 users (show)

Fixed In Version: 5.7.0.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1361610 (view as bug list)
Environment:
Last Closed: 2017-01-11 20:12:36 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description luke couzens 2016-07-28 14:11:04 UTC
Description of problem:RubyRep in restart loop after migration from 5.5.5.4 -> 5.6.1.0. The same issue seems to be present in standard migration as well as in-place upgrade.


Version-Release number of selected component (if applicable):5.6.1.0


How reproducible:100%


Steps to Reproduce:
1.provision 2x 5.5 appliances
2.configure 1st db with region 99 (r99)
3.configure 2nd db with region 0 (r0)
4.login to webui of r0 appliance
5.setup replication worker (configure-configuration-workers)
6.point it at r99 appliance
7.enable db synchronization (configure-configuration-server)
8.test replication by adding provider and checking it shows up in r99 also
9. follow migration docs to upgrade to 5.6 [0]

Actual results:rubyrep fails to start


Expected results: replication starts correctly


Additional info:
[0] https://access.redhat.com/articles/2297391 - inplace


evm.log
http://pastebin.test.redhat.com/396957

ips for standard migration:
rr99 - 10.16.6.208
rr0 - 10.16.6.85

ips for in-place upgrade
rr99 - 10.8.199.223
rr0 - 10.16.6.131

Comment 2 Nick Carboni 2016-07-28 14:19:55 UTC
The issue is a unique constraint error on the cloud_subnets_network_ports table. This was caused by a region agnostic migration which created join table rows containing data from a remote region with global ids when a global region was migrated.

This was introduced in https://github.com/ManageIQ/manageiq/pull/7237 

Unfortunately this issue seems to also be in 5.6.0.

We can doc a fix which will be, after the migration, to:
On the global region:
  - DELETE from cloud_subnets_network_ports;
On each remote region:
  - Stop the replication worker
  - bin/rake evm:dbsync:local_uninstall cloud_subnets_network_ports
  - Start the replication worker

I'm also currently working on a fix for the migration itself.

Comment 4 CFME Bot 2016-07-29 13:30:49 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/6c0ab8b32f8f621474588bfe94b62fc25692bc0d

commit 6c0ab8b32f8f621474588bfe94b62fc25692bc0d
Author:     Nick Carboni <ncarboni>
AuthorDate: Thu Jul 28 12:39:15 2016 -0400
Commit:     Nick Carboni <ncarboni>
CommitDate: Thu Jul 28 12:41:57 2016 -0400

    Only migrate rows in the current region
    
    This was causing an issue with replication when rows which actually
    belonged to a region were migrated.
    
    The new rows in the join table got an id in the global region.
    
    This caused replication to fail with a unique constraint error
    when trying to replicate the new rows in the regional database.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1361218

 ...oud_subnet_id_to_network_ports_cloud_subnets.rb |  6 +-
 ...ubnet_id_to_network_ports_cloud_subnets_spec.rb | 71 ++++++++++++++++++++++
 2 files changed, 74 insertions(+), 3 deletions(-)

Comment 7 luke couzens 2016-09-16 18:37:01 UTC
If we should be doing 5.5 - 5.7 inplace upgrade then we are currently blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1376888

Comment 8 luke couzens 2016-11-02 17:45:33 UTC
Verified in 5.7.0.7


Note You need to log in before you can comment on or make changes to this bug.