Description of problem: When encounter migrate failure, but test report did not highlight it, and also told user 0 failures. Version-Release number of selected component (if applicable): 1.2/2013-06-13.1 How reproducible: Always Steps to Reproduce: 1. Setup 1.1 env using latest ose-1.1.z puddle. 2. Create two jbosseap app. 3. Following http://etherpad.corp.redhat.com/OSE-1-2-upgrade-notes to do upgrade testing 4. Run "ose-upgrade gears" Actual results: The jbosseap gear will fail to be started due to BZ#972311. <--snip--> Starting gear with uuid 'bc9bb8d519a74eeeb0781c2f851bcd69' on node 'node.ose11test.com ' Start gear failed with an exception: Failed to execute: 'control start' for /var/lib/openshift/bc9bb8d519a74eeeb0781c2f851bcd69/jbosseap Marking step start_gear complete Validating gear bc9bb8d519a74eeeb0781c2f851bcd69 post-migration Pre-migration state: started Post-migration response code: 503 <--snip--> But in the end, test summary told user 0 failures. ##################################################### Summary: # of users: 1 # of gears: 8 # of failures: 0 Gear counts per thread: [8] Additional timings: migrate_on_node_measured_from_broker=309.769s redeploy_httpd_proxy=0.0s restart=0.0s total_migrate_gear_measured_from_broker=309.845s Time gathering users: 0.042s Time gathering active gears: 20.29s Total execution time: 334.055s ##################################################### And about the migrate error, it is better to use colour text to highlight it. Expected results: ose-upgrade tools should detect failures, and use colour text to highlight them. Additional info:
This is technically not considered a migration failure, since the migration has gone well as far as we can tell, but the gear restart failed. That can happen for a variety of reasons and we don't want to assume it's due to the migration. However, we agree it would be a good idea to let the user know that the gear failed to start after the migration. They would have to use an outside tool to detect that now (e.g. "service openshift-gears status" or check http return code on all apps) or just wait until users yelled about something; not a good experience. So we would like to introduce a return code to the migration that would be reserved just for this problem, and notify the user from the migration script which migrations had this problem and where to look for the log for each. That's a feature request, though, and it's not likely to make it into this release. We can consider for 1.2.1 or later release. It's also not clear whether we will ever have a migration quite like this again.
Jason to create a story for this in Trello