Description of problem: Given oo-admin-upgrade is run and there are some failed gears, when I try to execute the program again, the command is broken. The upgrade can not continue with the presence of /tmp/oo-upgrade/node_queue, however, when /tmp/oo-upgrade/node_queue is deleted, oo-admin-upgrade will do an upgrade against all gears, it does not actually try on the failed ones. Version-Release number of selected component (if applicable): On devenv_3660 How reproducible: Always Steps to Reproduce: 1. Prepare data, upgrade instance to latest and upgrade gears with oo-admin-upgrade, found some gears failed to upgrade. oo-admin-upgrade upgrade-node --upgrade-node ip-10-151-21-209 --version 2.0.32 --ignore-cartridge-version 2. Try to run the upgrade program again oo-admin-upgrade upgrade-node --upgrade-node ip-10-151-21-209 --version 2.0.32 --ignore-cartridge-version Actual results: [root@ip-10-151-21-209 oo-upgrade]# oo-admin-upgrade upgrade-node --upgrade-node ip-10-151-21-209 --version 2.0.32 --ignore-cartridge-version Upgrader started with options: {:version=>"2.0.32", :ignore_cartridge_version=>true, :target_server_identity=>"ip-10-151-21-209", :upgrade_position=>1, :num_upgraders=>1, :max_threads=>12, :gear_whitelist=>[]} Building new upgrade queues and cluster metadata Node queue file already exists at /tmp/oo-upgrade/node_queue /usr/sbin/oo-admin-upgrade:381:in `create_upgrade_queues' /usr/sbin/oo-admin-upgrade:251:in `upgrade' /usr/sbin/oo-admin-upgrade:999:in `block in upgrade_node' /usr/sbin/oo-admin-upgrade:928:in `with_upgrader' /usr/sbin/oo-admin-upgrade:988:in `upgrade_node' /opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor/task.rb:27:in `run' /opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor/invocation.rb:120:in `invoke_task' /opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor.rb:275:in `dispatch' /opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor/base.rb:425:in `start' /usr/sbin/oo-admin-upgrade:1004:in `<main>' /usr/sbin/oo-admin-upgrade:381:in `create_upgrade_queues': Node queue file already exists at /tmp/oo-upgrade/node_queue (RuntimeError) from /usr/sbin/oo-admin-upgrade:251:in `upgrade' from /usr/sbin/oo-admin-upgrade:999:in `block in upgrade_node' from /usr/sbin/oo-admin-upgrade:928:in `with_upgrader' from /usr/sbin/oo-admin-upgrade:988:in `upgrade_node' from /opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor/task.rb:27:in `run' from /opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor/invocation.rb:120:in `invoke_task' from /opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor.rb:275:in `dispatch' from /opt/rh/ruby193/root/usr/share/gems/gems/thor-0.15.4/lib/thor/base.rb:425:in `start' from /usr/sbin/oo-admin-upgrade:1004:in `<main>' Expected results: The program should pickup the failures and try to re-run the upgrade against them. Additional info:
Meng, You should never manually manipulate/delete the files in /tmp/oo-upgrade with the new oo-admin-upgrade tool; if you need to start over from scratch, use `oo-admin-upgrade archive` which will archive the contents of /tmp/oo-upgrade to /tmp/oo-upgrade/archive_{timestamp}. That said, I need a little more information for this test case. Can you try the following: 1. Start over (`oo-admin-upgrade archive`) 2. Run your first upgrade (where errors are expected) 3. Make a tarball of /tmp/oo-upgrade (e.g. upgrade-step-1.tar.gz) 4. Run your second upgrade (that you see failing) 5. Make another tarball of /tmp/oo-upgrade (e.g. upgrade-step-2.tar.gz) Then attach both tarballs and the stdout of both oo-admin-upgrade runs to this issue so I can inspect the before and after state of the data files. Thanks.
Comment #1 was addressed to Hou, sorry!
This is not reproducible this time, when the 1st time migration left some gears with errors, run it a second time and the program will process only the failed gears. If there is ever any problem again, I'll be sure to attach all logs, thanks!
Moving to verified according to comment 3