Description of problem: gears are not distributed evenly amongst zones if we use the rhc cartridge-scale --min option to scale up the app. If we use the 'rhc app scale up' Version-Release number of selected component (if applicable): latest devenv How reproducible: always. Steps to Reproduce: 1. Setup multi_node env with at least 2 nodes down: 2. Create 1 region, add 2 zones oo-admin-ctl-region -c create -r region_1 oo-admin-ctl-region -c add-zone -r region_1 -z zone_1 oo-admin-ctl-region -c add-zone -r region_1 -z zone_2 3. Create a scalable app rhc create-app perl1s perl-5.10 -s --no-git --no-dns 3. Set the min scale of the app to 4 rhc cartridge-scale -c perl-5.10 -a perl1s --min 4 4. Verify 2 gears are distributed to zone_1 and the other 2 are distributed to zone_2 Actual results: distribution is 3 to 1 or 7 to 1 depending on the argument of --min. So it looks like the zone distribution logic is not respected when using the --min option. Expected results: gears are distributed evenly between the 2 zones. Additional info:
https://github.com/openshift/origin-server/pull/6160 should resolve this.
Tested on devenv_5573, here is the result: Create a scalable app, when scaled to 4: 3 gears in zone_1, 1 gear in zone_2 when scaled to 10: 7 gears in zone_1, 3 gears in zone_2 when scaled to 16: 7 gears in zone_1, 9 gears in zone_2 The gears are not distributed evenly.
Its not going to be exactly even. There is a degree of randomness to it. The spread is going to be similar to the spread as if you were creating many applications. Here are some tests I did with the fix this morning: Creating multiple single-gear apps -=~~~~~~~~~~~~~~~~~~~~~~~~~~=- 4 apps: zone_1: 3 zone_2: 1 10 apps: zone_1: 7 zone_2: 3 16 apps: zone_1: 11 zone_2: 5 Scaling a single app -=~~~~~~~~~~~~~~~~~~~~~~~~~~=- 4 gears: zone_1: 3 zone_2: 1 10 gears: zone_1: 3 zone_2: 7 16 gears: zone_1: 9 zone_2: 7 I don't think we expect or intend for the distribution to be exactly even.
I'm incorrect. We DO expect distribution across zones to be as even as possible. The aforementioned fix does not solve this issue.
The issue appears to be two-fold: 1) With 'rhc cartridge-scale --min xx' we create all of the op groups before actually running them. So when distribution calculations are made, the consumed capacity returned from each node remains the same. This causes the last scale-up operation to distribute to the same node as the first. When scaling up from 1 to 4 gears, it makes sense that we always see one zone with 3 and the other with 1 gear because of this issue. 2) Distribution is still not even when scaling with 'rhc cartridge-scale --min xx' by running the command multiple times, incrementing --min by 1 each time. This is because the consumed capacity fact gathered from the nodes does not update quickly enough. If you wait ~1 minute between each scale operation, the gears are distributed perfectly even.
The best fix for this is likely going to be two fold: 1) Make each gear creation op depend on the last so that they are executed sequentially. The first gear in the scaling operation should have its distribution calculated and be created before the second gear has its distribution calculated at all. 2) Increase the frequency that the facts are regenerated on each node. This will involve moving the fact generation from a minutely cron job to something closer to 5-10 seconds. Testing will be required to find the right facts update interval. This should give enough time between when the gear is created and the next gear creation is calculated for the 'active_capacity' fact to be updated on the node.
While the node active_capacity (fact) might not have been updated for 2 successive gear creations during scale-up (assuming it all happens fast), explicitly defining the prereq for pending ops such that gear creation ops of gear #2 are executed ONLY after the gear creation ops of gear #1 are complete will ensure that if an application scales up by more gears (5-10), the new gears are spread better.
Fixed with --> https://github.com/openshift/origin-server/pull/6229
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/632c425aa02fb8ac423c61692e7b20f605aa3828 Bug 1234603: spreading gears for an app evenly across zones
initial test that failed now passed. Still need to look into the edge cases such as: 1. exiting gears in regions already 2. run the same scenario without tagging region/zone label but with pure nodes.
tested more scenario #1 & #2, gears are distributed evenly.