Bug 1034967

Summary: Git push will fail if upgrading a scalable app with multiple web gears when the child gear's FQDN only has the first 10 characters of the UUID
Product: OpenShift Container Platform Reporter: Andy Goldstein <agoldste>
Component: Cluster Version OperatorAssignee: Luke Meyer <lmeyer>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.0.0CC: bleanhar, jdetiber, jialiu, libra-onpremise-devel, lmeyer
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rubygem-openshift-origin-node-1.17.5.4-1 openshift-origin-broker-util-1.17.6.1-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-17 16:20:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andy Goldstein 2013-11-26 19:01:22 UTC
Description of problem: If a scalable app with child web gears is old enough, users won't be able to git push after upgrading to OSE 2.0.


Version-Release number of selected component (if applicable):


How reproducible: 100%


Steps to Reproduce:
1. Create an OSE 1.1 scalable application
2. Scale to at least 2 web gears
3. Verify in the haproxy-status page that the child gear is listed as gear-xxxxxxxxxx-mydomain (where the 10 x's represent the first 10 characters of the gear's UUID, instead of the full UUID)
4. Upgrade to 2.0
5. Try to git push to the application

Actual results: Something along the lines of this:

remote: Distribution failed for the following gears:
remote: abcd1234ab (rsync: connection unexpectedly closed (0 bytes received so
far) [sender])



Expected results: git push succeeds


Additional info:
In OSE 1.1, the child gear's FQDN was of the format $uuid10-$namespace.$domain, where $uuid10 is the first 10 characters of the gear's UUID. The upgrade code that converts haproxy/conf/gear-registry.db to gear-registry/gear-registry.json uses the data from gear-registry.db to determine the child gear UUID. But because the data in gear-registry.db doesn't contain the full UUID, the new gear-registry.json ends up with incorrect data. That data is used to determine the SSH URLs of the child gears so files can be synchronized to them, but this fails because the UUID is not a full UUID.

Comment 3 Luke Meyer 2014-01-03 19:37:14 UTC
agoldste points out that running "oo-admin-ctl-app update-cluster" on the app will fix the gear registry after the fact. Of course, that would require pulling in this commit, which is maybe a good idea anyway:

commit f2dab305a6d0a1ffc0b818e697b37586643eea62
Author: Andy Goldstein <andy.goldstein>
Date:   Wed Dec 4 09:13:47 2013 -0500
    Add update-cluster to oo-admin-ctl-app

Since my first impression is wrong and this seems to be contained to one bit of the upgrade (that updates the gear registry), maybe that part is fixable.

Comment 4 Luke Meyer 2014-01-06 22:51:27 UTC
In addition to the above commit, I fixed the gear_upgrade_extension (node package) that runs during the upgrade so that it actually gets the gear registry right in the first place:

https://github.com/openshift/enterprise-server/pull/185
(which is #noupstream as this will never be used for Online again)

Changes will be in rubygem-openshift-origin-node and openshift-origin-broker-util.

Comment 5 Luke Meyer 2014-01-07 13:41:25 UTC
Packages built.

Comment 7 Johnny Liu 2014-01-09 14:57:09 UTC
Following http://etherpad.corp.redhat.com/ose-2-0-upgrade-2014-01-07, and replace puddle using 2.0.z/2014-01-08.1, verification is PASS.


Scalable app (python and ruby-1.8 cartridge tested) is created in 1.1, upgrade it to 1.2, then 2.0, git push successfully.

$ git commit -a -mx; git push
[master 2bca319] x
 1 file changed, 1 insertion(+), 1 deletion(-)

Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 290 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
remote: Stopping Ruby cartridge
remote: [Thu Jan 09 07:44:18 2014] [warn] PassEnv variable SHELL was undefined
remote: [Thu Jan 09 07:44:18 2014] [warn] PassEnv variable USER was undefined
remote: [Thu Jan 09 07:44:18 2014] [warn] PassEnv variable LOGNAME was undefined
remote: Waiting for stop to finish
remote: Waiting for stop to finish
remote: Syncing git content to other proxy gears
remote: Saving away previously bundled RubyGems
remote: Building git ref 'master', commit 2bca319
remote: Building Ruby cartridge
remote: Restoring previously bundled RubyGems (note: you can commit .openshift/markers/force_clean_build at the root of your repo to force a clean bundle)
remote: Bundling RubyGems based on Gemfile/Gemfile.lock to repo/vendor/bundle with 'bundle install --deployment'
remote: Using mysql (2.9.1) 
remote: Using rack (1.5.2) 
remote: Using rack-protection (1.5.0) 
remote: Using tilt (1.4.1) 
remote: Using sinatra (1.4.3) 
remote: Using bundler (1.0.21) 
remote: Your bundle is complete! It was installed into ./vendor/bundle
remote: Preparing build for deployment
remote: Deployment id is 37fb1440
remote: Distributing deployment to child gears
remote: Activating deployment
remote: HAProxy already running
remote: HAProxy instance is started
remote: Starting Ruby cartridge
remote: Result: success
remote: Distribution status: success
remote: Activation status: success
remote: Deployment completed with status: success
To ssh://4fc17fce51e74a3f813f5ebfdf8deb50.com/~/git/ruby18scal.git/
   d50ef78..2bca319  master -> master