Bug 1001855 - oo-admin-upgrade isn't upgrading all active gears first and then starting on idle gears...
oo-admin-upgrade isn't upgrading all active gears first and then starting on ...
Product: OpenShift Online
Classification: Red Hat
Component: Containers (Show other bugs)
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Dan Mace
libra bugs
Depends On:
  Show dependency treegraph
Reported: 2013-08-27 21:41 EDT by Thomas Wiest
Modified: 2015-05-14 19:27 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-09-19 12:48:22 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Thomas Wiest 2013-08-27 21:41:46 EDT
Description of problem:
We're running oo-admin-upgrade in PROD, and it's upgrading idle gears before all active gears have completed.

28 hosts on ex-srv1 haven't even started upgrading their active gears (nor their idle gears).

What it looks like is happening is that the oo-admin-upgrade queues were scheduled to run the active, then inactive of the same host before moving onto the new host. 

This, of course, means that the next hosts' active gears won't be scheduled until after the current hosts inactive gears are finished, which is not what we want.

To be clear, last release this worked properly. The recent refactoring broke this.

Version-Release number of selected component (if applicable):

How reproducible:
Very in PROD

Steps to Reproduce:
1. unknown

Actual results:
Active gears aren't being upgraded until all gears on the initial set of hosts are done.

Expected results:
All active gears should be upgraded, then move onto idle gears.
Comment 1 Thomas Wiest 2013-09-06 16:44:12 EDT
Abhishek, Not sure why you think this is a bug for the node team. It's a bug in the gear upgrade scheduling of oo-admin-upgrade.

Maybe I didn't explain it clearly, let me try again.

oo-admin-upgrade has a maximum number of nodes that it'll upgrade at a time. I think that number is 8. It used to be called "THREADS" but I can't find that now in the script.

So let's say that we're upgrading 12 nodes, oo-admin-upgrade would chunk that into two groups, the first group would be the first 8 nodes, the 2nd group would be the final 4 nodes.

What's happening is that oo-admin-upgrade is upgrading the active gears on the first 8 nodes (which is correct), but then it's upgrading the inactive gears on the first 8 nodes _before_ it upgrades the active gears on the last 4 nodes. This is incorrect.

All active gears on all nodes should be upgraded _before_ inactive gears.

This is how oo-admin-upgrade worked before it's recent refactor.

Also, this problem is very bad in PROD where we have a lot of gears and nodes.

Moving back to broker.
Comment 2 Dan Mace 2013-09-06 16:53:24 EDT

It's a bug for the runtime team as we own the oo-admin-upgrade script. The code itself is just poorly located currently (in a placed which implies it's owned by the broker). I understand and acknowledge the bug, and will work on getting it fixed.

Comment 3 Thomas Wiest 2013-09-09 09:41:25 EDT
Oh, I see, sorry for the confusion. :)
Comment 4 Dan Mace 2013-09-10 18:35:30 EDT

Please test multi-node setups including failures and re-runs with the same parameters to verify that they are corrected the second time around. If you have any questions about constructing scenarios, please get in touch directly. Thanks!
Comment 5 openshift-github-bot 2013-09-10 21:21:39 EDT
Commit pushed to master at https://github.com/openshift/origin-server

Bug 1001855: Process all active gears before inactive
Comment 6 Meng Bo 2013-09-11 04:25:52 EDT
Checked on devenv_3772 with multi-node and about 120 gears on it.

Upgrade with max-thread=1

# oo-admin-upgrade upgrade-node --version 2.0.33 --ignore-cartridge-version --max-threads=1
Upgrader started with options: {:version=>"2.0.33", :ignore_cartridge_version=>true, :target_server_identity=>nil, :upgrade_position=>1, :num_upgraders=>1, :max_threads=>1, :gear_whitelist=>[]}
Building new upgrade queues and cluster metadata
Getting all active gears...
Getting all logins...
Writing 34 entries to gear queue for node ip-10-184-29-92 at /tmp/oo-upgrade/gear_queue_ip-10-184-29-92
Writing 21 entries to gear queue for node ip-10-184-29-92 at /tmp/oo-upgrade/gear_queue_ip-10-184-29-92
Writing 45 entries to gear queue for node ip-10-164-113-135 at /tmp/oo-upgrade/gear_queue_ip-10-164-113-135
Writing 20 entries to gear queue for node ip-10-164-113-135 at /tmp/oo-upgrade/gear_queue_ip-10-164-113-135

tail the upgrade log under /tmp/oo-upgrade the inactive gears will start upgrade when the active ones are finished on all nodes.

Note You need to log in before you can comment on or make changes to this bug.