Bug 1020152 - [origin_broker_108]Gear endpoints are not being migrated properly by datastore migration
Summary: [origin_broker_108]Gear endpoints are not being migrated properly by datastor...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Rajat Chopra
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-17 07:21 UTC by Jianwei Hou
Modified: 2015-05-15 00:22 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-24 03:24:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
mongodump before migraton on devenv-stage_486 (32.03 KB, application/gzip)
2013-10-17 07:21 UTC, Jianwei Hou
no flags Details
mongodump after compatible datastore migration (32.24 KB, application/gzip)
2013-10-17 07:22 UTC, Jianwei Hou
no flags Details
2013-10-24 mongodump after migration, 3 failed gears (38.91 KB, application/gzip)
2013-10-25 05:39 UTC, Jianwei Hou
no flags Details

Description Jianwei Hou 2013-10-17 07:21:59 UTC
Created attachment 813210 [details]
mongodump before migraton on devenv-stage_486

Description of problem:
Run rhc-admin-migrate-datatore to migrate application endpoints, after migration, check the port_interfaces of scalable application, and found protocols for some cartridges are not correctly updated.


Version-Release number of selected component (if applicable):
On devenv_3907

How reproducible:
Always

Steps to Reproduce:
1. Create scalable applications with all possible cartridges on devevn-stage_486 ami(This ami does not have 'protocols' and 'type' field before migration)
2. Upgrade server to latest version as devenv_3907
3. Clear broker cache, restart broker, ruby193-mcollective
4. Do datastore migration
rhc-admin-migrate-datastore --compatible --version 2.0.35
5. Use mongo shell to verify the protocols and types endpoints are migrated properly

Actual results:
1. For db cartridges, mysql, postgresql, mongodb have 'protocols' updated as 'tcp'
2. On devenv-stage_486 ami, the group_instaces of scalable application does not have 'haproxy-1.4' cartridges, so after migration, the apps still don't have haproxy endpoints in their group_instaces 

Expected results:
1. The db cartridges 'protocols' value should be their db names, which is defined by their manifest
2. Should have haproxy-1.4 endpoints updated to the app's group_instances

Additional info:
I will attach before and after versions of mongodump

Comment 1 Jianwei Hou 2013-10-17 07:22:32 UTC
Created attachment 813211 [details]
mongodump after compatible datastore migration

Comment 2 Rajat Chopra 2013-10-23 00:36:02 UTC
Fixed with https://github.com/openshift/li/pull/2026
The haproxy of older apps needs to get its ports exposed.

Regarding database gears, check in the db gear - the mysql manifest in the gear should reflect the appropriate protocols. If not then the upgrade process needs to be run with  :ignore_cartridge_version flag. With that the manifest should get updated nicely. The reason maybe because we have not have gotten bumped up at the time of testing.

Comment 3 Jianwei Hou 2013-10-24 07:01:07 UTC
Verified on devenv_3937, the haproxy's endpoints are migrated successfully, sample:

{
							"_id" : ObjectId("5268b7c295b32b21cb000023"),
							"cartridge_name" : "haproxy-1.4",
							"external_port" : "38206",
							"internal_address" : "127.2.5.130",
							"internal_port" : 8080,
							"protocols" : [
								"http",
								"ws"
							],
							"type" : [
								"load_balancer"
							],
							"mappings" : [
								{
									"frontend" : "",
									"backend" : ""
								},
								{
									"frontend" : "/health",
									"backend" : "/configuration/health"
								}
							]
						},

Comment 4 Jianwei Hou 2013-10-24 10:45:30 UTC
Sorry, When testing this script again, if the exit_code is non-zero, then an stack trace will pop up, and the program is broken.

[root@ip-10-151-19-45 ~]# rhc-admin-migrate-datastore --compatible --version 2.0.35
/usr/bin/rhc-admin-migrate-datastore:52:in `block in expose_haproxy_port': undefined method `_id' for "5268e8c719b7dea4bd0002a5":String (NoMethodError)
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.16.3/app/models/remote_job.rb:125:in `call'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.16.3/app/models/remote_job.rb:125:in `block (2 levels) in get_parallel_run_results'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.16.3/app/models/remote_job.rb:124:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.16.3/app/models/remote_job.rb:124:in `block in get_parallel_run_results'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.16.3/app/models/remote_job.rb:123:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.16.3/app/models/remote_job.rb:123:in `get_parallel_run_results'
	from /usr/bin/rhc-admin-migrate-datastore:48:in `expose_haproxy_port'
	from /usr/bin/rhc-admin-migrate-datastore:21:in `migrate'
	from /usr/bin/rhc-admin-migrate-datastore:208:in `<main>'


code block:
  RemoteJob.get_parallel_run_results(handle) { |tag, gear, stdout, exit_code|
    if exit_code!=0
      puts "Error in executing parallel job for gear #{gear._id.to_s}"
    else
      exposed += 1
    end
  }

Comment 5 Rajat Chopra 2013-10-24 23:03:59 UTC
Fixed with https://github.com/openshift/li/pull/2039

Comment 6 Jianwei Hou 2013-10-25 05:39:03 UTC
Tested on devenv_3942, this bug is fixed. Somehow, I have 3 gears failed mongo migration, after investigation, found they were 1 jbossas gear and 2 jbosseap gears. I didn't figure out why they failed, so I've attached the mongodump, can you please help confirm if this result is acceptable? Thanks

[root@ip-10-139-13-20 ~]# rhc-admin-migrate-datastore --compatible --version 2.0.35
Starting migration: compatible
Error in executing parallel job for gear 5269e21b4b4e3f9e380002a5
Error in executing parallel job for gear 5269e3064b4e3f9e38000317
Error in executing parallel job for gear 5269e7ab4b4e3f9e380005e7
Done migrating 12/15 applications.
Time to get all gears from nodes: 4.926s
Total gears found on the nodes: 47

Comment 7 Jianwei Hou 2013-10-25 05:39:45 UTC
Created attachment 816007 [details]
2013-10-24 mongodump after migration, 3 failed gears

Comment 8 Rajat Chopra 2013-10-25 14:27:38 UTC
Could you attach the broker log and mcollective logs also?

Comment 9 Rajat Chopra 2013-10-25 22:38:21 UTC
Found the issue. JBoss gears will throw this error because the number of ports exposed is already 5, and 5 is the limit. So an haproxy expose port fails there.

That should be considered expected and we will take care of these ones as a separate task.

Comment 10 Meng Bo 2013-10-28 07:47:39 UTC
Checked on devenv-stage_528 to devenv_3953, the only affected gears are all scalable jbossas and jbosseap. As the command#9, this should be expected result.

We will move the bug to verified.

@rchopra

Will the scale jboss issue be fixed in sprint35? And do we need open new bug for tracking this issue?

Comment 11 Rajat Chopra 2015-01-05 04:12:20 UTC
Closing out on the 'needinfo' here. The jboss issue was fixed subsequently - the master branch already has the fixed code.


Note You need to log in before you can comment on or make changes to this bug.