Bug 1470350
Summary: | A/B deployment seems to round-robin across all pods in multiple services, instead of proportional routing to services | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Dave Neary <dneary> | ||||
Component: | Networking | Assignee: | Phil Cameron <pcameron> | ||||
Networking sub component: | router | QA Contact: | zhaozhanqi <zzhao> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | medium | ||||||
Priority: | medium | CC: | aos-bugs, atragler, bbennett, dmcphers, eparis, rkhan, sukulkar, xtian, yadu | ||||
Version: | 3.6.0 | Keywords: | Reopened | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.7.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Enhancement | |||||
Doc Text: |
Feature: See docs pr 4847
Reason:
Result:
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1473736 1477685 (view as bug list) | Environment: | |||||
Last Closed: | 2017-11-28 22:00:46 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1473736, 1477685 | ||||||
Attachments: |
|
Description
Dave Neary
2017-07-12 19:20:27 UTC
Wasn't there a requirement that you had to set an annotation on the route haproxy.router.openshift.io/balance = leastconn or something like that? You can try that, and if it does not work I assume we are going to want to see the yaml/json for the route in question. I could reproduce the issue with latest OCP env openshift v3.6.143 kubernetes v1.6.1+5115d708d7 $ oc set route-backends route1 NAME KIND TO WEIGHT routes/route1 Service service-unsecure 50 (50%) routes/route1 Service service-unsecure-2 50 (50%) $ oc scale rc test-rc-1 --replicas=4 replicationcontroller "test-rc-1" scaled $ oc get pod -w NAME READY STATUS RESTARTS AGE test-rc-1-33rhr 1/1 Running 0 11s test-rc-1-mfjw6 1/1 Running 0 12m test-rc-1-tnn5g 1/1 Running 0 11s test-rc-1-w5gh1 1/1 Running 0 11s test-rc-2-mmf4r 1/1 Running 0 12m $ for i in {1..50}; do curl route1-sess.0713-u9a.qe.rhcloud.com ; done Hello-OpenShift-1 http-8080 Hello-OpenShift-2 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-2 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-2 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-2 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-2 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-2 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-2 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-2 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-2 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-2 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-1 http-8080 The route balance is roundrobin by default with multiple service in route, but even I set haproxy.router.openshift.io/balance = leastconn, it doesn't work too. For the route yaml file and the haproxy.config are in attachment. Created attachment 1297428 [details]
route yaml
This is actually functioning as intended... but the behavior is not properly documented (and is rather confusing anyway). The weights apply to each backend, so if you have a route with service A at weight 1 and service B at weight 2, if the number of back-ends for each is equal, then a pod backing A will get 33% of the traffic and a pod backing B will get 66%. BUT if there are 2 endpoints for A and 1 for B, the numbers change... an A pod will respond 50% of the time and a B pod will respond 50% of the time. Since weighting requires round-robin, you would expect to see round-robin behavior if all weights are equal. If you set the balance type to leastconn, weighting has no effect. We could take a feature request to change the behavior to set the weights proportionally based on the fraction of endpoints that service has, but that is not the way this was originally designed and approved. We really need to update the docs to make this more clear, trello card https://trello.com/c/MajuXbiV tracks the docs improvements, and I have asked our networking docs person, and a networking developer to look at improving this ASAP. Re-opened because the networking team decided that while we made a deliberate choice of this behavior... it was not a great one and the current behavior will surprise a lot of users. Thanks for the update and the confirmation Ben, Yan - I know this surprised me (and the evangelist training us). Also, thanks for showing me up with the {1..50} shell built-in over running seq, yan ;-) Commit pushed to master at https://github.com/openshift/openshift-docs https://github.com/openshift/openshift-docs/commit/0ccbe544c0db84a9436738fed7051ea1a959b3ae Document a/b deployment A route can front up to 4 services that handle the requests. The load balancing strategy governs which endpoint gets each request. When roundrobin is chosen, the portion of the requests that each service handles is governed by the weight assigned to the service. Each endpoint in the service gets a fraction of the service's requests bug 1470350 https://bugzilla.redhat.com/show_bug.cgi?id=1470350 Code change is in origin PR 15309 https://github.com/openshift/origin/pull/15309 Commits pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/b0db0b2e3bcf1db113ed61ed1083853c986b6ef9 Make A/B deployment proportional to service weight Distribute requests among the route's services based on service weight. The portion of the total weight that each service has is distributed evenly among the service's endpoints. bug 1470350 https://bugzilla.redhat.com/show_bug.cgi?id=1470350 https://github.com/openshift/origin/commit/b64c94eb239034b0a8df8abf290f5322d7600855 Merge pull request #15309 from pecameron/bz1470350 Automatic merge from submit-queue (batch tested with PRs 15533, 15414, 15584, 15529, 15309) Make A/B deployment proportional to service weight Distribute requests among the route's services based on service weight. The portion of the total weight that each service has is distributed evenly among the service's endpoints. bug 1470350 https://bugzilla.redhat.com/show_bug.cgi?id=1470350 *** Bug 1479392 has been marked as a duplicate of this bug. *** Test on latest OCP-3.7 env openshift v3.7.0-0.127.0 kubernetes v1.7.0+80709908fd The issue have been fixed Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |