| Summary: | oo-admin-upgrade isn't upgrading all active gears first and then starting on idle gears... | ||
|---|---|---|---|
| Product: | OpenShift Online | Reporter: | Thomas Wiest <twiest> |
| Component: | Containers | Assignee: | Dan Mace <dmace> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | libra bugs <libra-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 2.x | CC: | bmeng, dmace |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-09-19 16:48:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Thomas Wiest
2013-08-28 01:41:46 UTC
Abhishek, Not sure why you think this is a bug for the node team. It's a bug in the gear upgrade scheduling of oo-admin-upgrade. Maybe I didn't explain it clearly, let me try again. oo-admin-upgrade has a maximum number of nodes that it'll upgrade at a time. I think that number is 8. It used to be called "THREADS" but I can't find that now in the script. So let's say that we're upgrading 12 nodes, oo-admin-upgrade would chunk that into two groups, the first group would be the first 8 nodes, the 2nd group would be the final 4 nodes. What's happening is that oo-admin-upgrade is upgrading the active gears on the first 8 nodes (which is correct), but then it's upgrading the inactive gears on the first 8 nodes _before_ it upgrades the active gears on the last 4 nodes. This is incorrect. All active gears on all nodes should be upgraded _before_ inactive gears. This is how oo-admin-upgrade worked before it's recent refactor. Also, this problem is very bad in PROD where we have a lot of gears and nodes. Moving back to broker. Thomas, It's a bug for the runtime team as we own the oo-admin-upgrade script. The code itself is just poorly located currently (in a placed which implies it's owned by the broker). I understand and acknowledge the bug, and will work on getting it fixed. Thanks! Oh, I see, sorry for the confusion. :) https://github.com/openshift/origin-server/pull/3610 Please test multi-node setups including failures and re-runs with the same parameters to verify that they are corrected the second time around. If you have any questions about constructing scenarios, please get in touch directly. Thanks! Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/89019725ca61479cc13a7247b21a9b8cb989aa12 Bug 1001855: Process all active gears before inactive Checked on devenv_3772 with multi-node and about 120 gears on it.
Upgrade with max-thread=1
# oo-admin-upgrade upgrade-node --version 2.0.33 --ignore-cartridge-version --max-threads=1
Upgrader started with options: {:version=>"2.0.33", :ignore_cartridge_version=>true, :target_server_identity=>nil, :upgrade_position=>1, :num_upgraders=>1, :max_threads=>1, :gear_whitelist=>[]}
Building new upgrade queues and cluster metadata
Getting all active gears...
Getting all logins...
Writing 34 entries to gear queue for node ip-10-184-29-92 at /tmp/oo-upgrade/gear_queue_ip-10-184-29-92
Writing 21 entries to gear queue for node ip-10-184-29-92 at /tmp/oo-upgrade/gear_queue_ip-10-184-29-92
Writing 45 entries to gear queue for node ip-10-164-113-135 at /tmp/oo-upgrade/gear_queue_ip-10-164-113-135
Writing 20 entries to gear queue for node ip-10-164-113-135 at /tmp/oo-upgrade/gear_queue_ip-10-164-113-135
tail the upgrade log under /tmp/oo-upgrade the inactive gears will start upgrade when the active ones are finished on all nodes.
|