Bug 1234603 - gears are not distributed evenly amongst zones if we use the rhc cartridge-scale --min option
Summary: gears are not distributed evenly amongst zones if we use the rhc cartridge-sc...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Timothy Williams
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks: 1228373 1267728
TreeView+ depends on / blocked
 
Reported: 2015-06-22 19:09 UTC by Peter Ruan
Modified: 2019-08-15 04:45 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-09-08 20:14:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Peter Ruan 2015-06-22 19:09:16 UTC
Description of problem:
gears are not distributed evenly amongst zones if we use the rhc cartridge-scale --min option to scale up the app.  If we use the 'rhc app scale up' 

Version-Release number of selected component (if applicable):
latest devenv

How reproducible:
always.

Steps to Reproduce:
1. Setup multi_node env with at least 2 nodes
	down:
2. Create 1 region, add 2 zones
oo-admin-ctl-region -c create -r region_1
oo-admin-ctl-region -c add-zone -r region_1 -z zone_1
oo-admin-ctl-region -c add-zone -r region_1 -z zone_2

3. Create a scalable app
rhc create-app perl1s perl-5.10 -s --no-git --no-dns
3. Set the min scale of the app to 4
rhc cartridge-scale -c perl-5.10 -a perl1s --min 4
4. Verify 2 gears are distributed to zone_1 and the other 2 are distributed to zone_2


Actual results:
distribution is 3 to 1 or 7 to 1 depending on the argument of --min.  So it looks like the zone distribution logic is not respected when using the --min option.

Expected results:
gears are distributed evenly between the 2 zones.

Additional info:

Comment 2 Timothy Williams 2015-07-01 14:28:36 UTC
https://github.com/openshift/origin-server/pull/6160 should resolve this.

Comment 3 Jianwei Hou 2015-07-02 09:56:09 UTC
Tested on devenv_5573, here is the result:
Create a scalable app, when scaled to 4: 3 gears in zone_1, 1 gear in zone_2
when scaled to 10: 7 gears in zone_1, 3 gears in zone_2
when scaled to 16: 7 gears in zone_1, 9 gears in zone_2

The gears are not distributed evenly.

Comment 4 Timothy Williams 2015-07-02 14:40:43 UTC
Its not going to be exactly even. There is a degree of randomness to it. The spread is going to be similar to the spread as if you were creating many applications. Here are some tests I did with the fix this morning:

Creating multiple single-gear apps
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-
4 apps:
  zone_1: 3
  zone_2: 1

10 apps:
  zone_1: 7
  zone_2: 3

16 apps:
  zone_1: 11
  zone_2: 5

Scaling a single app
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-
4 gears:
  zone_1: 3
  zone_2: 1

10 gears:
  zone_1: 3
  zone_2: 7

16 gears:
  zone_1: 9
  zone_2: 7

I don't think we expect or intend for the distribution to be exactly even.

Comment 5 Timothy Williams 2015-07-02 14:56:15 UTC
I'm incorrect. We DO expect distribution across zones to be as even as possible. The aforementioned fix does not solve this issue.

Comment 6 Timothy Williams 2015-07-02 16:33:21 UTC
The issue appears to be two-fold:

1) With 'rhc cartridge-scale --min xx' we create all of the op groups before actually running them. So when distribution calculations are made, the consumed capacity returned from each node remains the same. This causes the last scale-up operation to distribute to the same node as the first. When scaling up from 1 to 4 gears, it makes sense that we always see one zone with 3 and the other with 1 gear because of this issue.

2) Distribution is still not even when scaling with 'rhc cartridge-scale --min xx' by running the command multiple times, incrementing --min by 1 each time. This is because the consumed capacity fact gathered from the nodes does not update quickly enough. If you wait ~1 minute between each scale operation, the gears are distributed perfectly even.

Comment 7 Timothy Williams 2015-07-02 19:20:38 UTC
The best fix for this is likely going to be two fold:

1) Make each gear creation op depend on the last so that they are executed sequentially. The first gear in the scaling operation should have its distribution calculated and be created before the second gear has its distribution calculated at all.

2) Increase the frequency that the facts are regenerated on each node. This will involve moving the fact generation from a minutely cron job to something closer to 5-10 seconds. Testing will be required to find the right facts update interval. 

This should give enough time between when the gear is created and the next gear creation is calculated for the 'active_capacity' fact to be updated on the node.

Comment 8 Abhishek Gupta 2015-07-02 19:31:21 UTC
While the node active_capacity (fact) might not have been updated for 2 successive gear creations during scale-up (assuming it all happens fast), explicitly defining the prereq for pending ops such that gear creation ops of gear #2 are executed ONLY after the gear creation ops of gear #1 are complete will ensure that if an application scales up by more gears (5-10), the new gears are spread better.

Comment 10 Abhishek Gupta 2015-09-01 21:50:10 UTC
Fixed with --> https://github.com/openshift/origin-server/pull/6229

Comment 11 openshift-github-bot 2015-09-01 22:33:31 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/632c425aa02fb8ac423c61692e7b20f605aa3828
Bug 1234603: spreading gears for an app evenly across zones

Comment 12 Peter Ruan 2015-09-02 22:58:29 UTC
initial test that failed now passed.  Still need to look into the edge cases such as:

1. exiting gears in regions already
2. run the same scenario without tagging region/zone label but with pure nodes.

Comment 13 Peter Ruan 2015-09-03 05:14:58 UTC
tested more scenario #1 & #2, gears are distributed evenly.


Note You need to log in before you can comment on or make changes to this bug.