Bug 997769 - oo-admin-upgrade will be broken when trying to rerun for the failed gears from previous upgrade
oo-admin-upgrade will be broken when trying to rerun for the failed gears fro...
Product: OpenShift Online
Classification: Red Hat
Component: Containers (Show other bugs)
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Dan Mace
libra bugs
Depends On:
Blocks: 991543
  Show dependency treegraph
Reported: 2013-08-16 03:58 EDT by Jianwei Hou
Modified: 2015-05-14 19:26 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-08-29 08:51:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Jianwei Hou 2013-08-16 03:58:11 EDT
Description of problem:
Given oo-admin-upgrade is run and there are some failed gears, when I try to execute the program again, the command is broken.
The upgrade can not continue with the presence of /tmp/oo-upgrade/node_queue, however, when /tmp/oo-upgrade/node_queue is deleted, oo-admin-upgrade will do an upgrade against all gears, it does not actually try on the failed ones.

Version-Release number of selected component (if applicable):
On devenv_3660

How reproducible:

Steps to Reproduce:
1. Prepare data, upgrade instance to latest and upgrade gears with oo-admin-upgrade, found some gears failed to upgrade.
oo-admin-upgrade upgrade-node --upgrade-node ip-10-151-21-209 --version 2.0.32 --ignore-cartridge-version
2. Try to run the upgrade program again
oo-admin-upgrade upgrade-node --upgrade-node ip-10-151-21-209 --version 2.0.32 --ignore-cartridge-version

Actual results:
[root@ip-10-151-21-209 oo-upgrade]# oo-admin-upgrade upgrade-node --upgrade-node ip-10-151-21-209 --version 2.0.32 --ignore-cartridge-version
Upgrader started with options: {:version=>"2.0.32", :ignore_cartridge_version=>true, :target_server_identity=>"ip-10-151-21-209", :upgrade_position=>1, :num_upgraders=>1, :max_threads=>12, :gear_whitelist=>[]}
Building new upgrade queues and cluster metadata
Node queue file already exists at /tmp/oo-upgrade/node_queue
/usr/sbin/oo-admin-upgrade:381:in `create_upgrade_queues'
/usr/sbin/oo-admin-upgrade:251:in `upgrade'
/usr/sbin/oo-admin-upgrade:999:in `block in upgrade_node'
/usr/sbin/oo-admin-upgrade:928:in `with_upgrader'
/usr/sbin/oo-admin-upgrade:988:in `upgrade_node'
/opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor/task.rb:27:in `run'
/opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor/invocation.rb:120:in `invoke_task'
/opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor.rb:275:in `dispatch'
/opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor/base.rb:425:in `start'
/usr/sbin/oo-admin-upgrade:1004:in `<main>'
/usr/sbin/oo-admin-upgrade:381:in `create_upgrade_queues': Node queue file already exists at /tmp/oo-upgrade/node_queue (RuntimeError)
	from /usr/sbin/oo-admin-upgrade:251:in `upgrade'
	from /usr/sbin/oo-admin-upgrade:999:in `block in upgrade_node'
	from /usr/sbin/oo-admin-upgrade:928:in `with_upgrader'
	from /usr/sbin/oo-admin-upgrade:988:in `upgrade_node'
	from /opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor/task.rb:27:in `run'
	from /opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor/invocation.rb:120:in `invoke_task'
	from /opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor.rb:275:in `dispatch'
	from /opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor/base.rb:425:in `start'
	from /usr/sbin/oo-admin-upgrade:1004:in `<main>'

Expected results:
The program should pickup the failures and try to re-run the upgrade against them. 

Additional info:
Comment 1 Dan Mace 2013-08-16 14:28:45 EDT

You should never manually manipulate/delete the files in /tmp/oo-upgrade with the new oo-admin-upgrade tool; if you need to start over from scratch, use `oo-admin-upgrade archive` which will archive the contents of /tmp/oo-upgrade to /tmp/oo-upgrade/archive_{timestamp}.

That said, I need a little more information for this test case. Can you try the following:

1. Start over (`oo-admin-upgrade archive`)
2. Run your first upgrade (where errors are expected)
3. Make a tarball of /tmp/oo-upgrade (e.g. upgrade-step-1.tar.gz)
4. Run your second upgrade (that you see failing)
5. Make another tarball of /tmp/oo-upgrade (e.g. upgrade-step-2.tar.gz)

Then attach both tarballs and the stdout of both oo-admin-upgrade runs to this issue so I can inspect the before and after state of the data files.

Comment 2 Dan Mace 2013-08-16 14:29:20 EDT
Comment #1 was addressed to Hou, sorry!
Comment 3 Jianwei Hou 2013-08-19 07:17:59 EDT
This is not reproducible this time, when the 1st time migration left some gears with errors, run it a second time and the program will process only the failed gears. 

If there is ever any problem again, I'll be sure to attach all logs, thanks!
Comment 4 Jianwei Hou 2013-08-20 03:48:43 EDT
Moving to verified according to comment 3

Note You need to log in before you can comment on or make changes to this bug.