Bug 1477685
Summary: | A/B deployment seems to round-robin across all pods in multiple services, instead of proportional routing to services | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Phil Cameron <pcameron> |
Component: | Networking | Assignee: | Phil Cameron <pcameron> |
Networking sub component: | router | QA Contact: | zhaozhanqi <zzhao> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | aos-bugs, atragler, bbennett, dneary, eparis, rkhan, sukulkar, xtian, yadu, zzhao |
Version: | 3.7.0 | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | 3.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: See Docs PR 5816
Consequence:
Fix:
Result:
|
Story Points: | --- |
Clone Of: | 1470350 | Environment: | |
Last Closed: | 2017-11-28 22:06:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1470350 | ||
Bug Blocks: | 1473736 |
Description
Phil Cameron
2017-08-02 15:55:57 UTC
See PR 15970 for details. We decided to not continue with these changes. Scaling the number of pods, as documented, gets past the problem of having too many pods with minimum weight of 1. If there is a customer need we will re-open this bug. Thanks for the update - I read through https://github.com/openshift/origin/pull/15970 and that seems to be addressing a different issue (or at least, I don't see the relation to this bug report). No need to re-open. Commits pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/00d2d715129ee3e6e09d0713f68181acf6ac89f3 Router - A/B weights distribution improvement Service weight is roughly distributed among the endpoints. The service can end up with more or less weight than desired. This change finds the desired endpoint weight for each service and then scales the maximum weight to 256 (the maximum). The scaling is done before writing the haproxy.config file to account for any endpoint changes. bug: 1477685 https://bugzilla.redhat.com/show_bug.cgi?id=1477685 Trello: BhWCH3vu https://trello.com/c/BhWCH3vu/543-3-fix-problem-in-a-b-weight-algorithm https://github.com/openshift/origin/commit/296e98abb6a9938112db9e1f9d99b6cdbc5f7db7 Merge pull request #16090 from pecameron/bz1477685a Automatic merge from submit-queue (batch tested with PRs 16896, 16908, 16935, 16898, 16090). Router - A/B weights distribution improvement Service weight is roughly distributed among the endpoints. The service can end up with more or less weight than desired. This change finds the desired endpoint weight for each service and then scales the maximum weight to 256 (the maximum). The scaling is done before writing the haproxy.config file to account for any endpoint changes. bug: 1477685 https://bugzilla.redhat.com/show_bug.cgi?id=1477685 Trello: BhWCH3vu https://trello.com/c/BhWCH3vu/543-3-fix-problem-in-a-b-weight-algorithm As part of the new A/B load balancing algorithm change, PR 16090 also fixes a problem where an endpoint change is not reflected in the router configuration. This can lead to 503 responses even though there is an endpoint on the service. Test on OCP 3.7, issue have been fixed. openshift v3.7.0-0.174.0 kubernetes v1.7.6+a08f5eeb62 Test on latest OCP openshift v3.7.0-0.184.0 kubernetes v1.7.6+a08f5eeb62 Verify steps are in bug description. Actual result: Each endpoint gets weight/numberOfEndpoints portion of the requests # oc get pod NAME READY STATUS RESTARTS AGE test-rc-1-5dthx 1/1 Running 0 30s test-rc-1-pdghh 1/1 Running 0 1m test-rc-2-6dfzv 1/1 Running 0 25s test-rc-2-qgmgg 1/1 Running 0 25s test-rc-2-tx7ws 1/1 Running 0 1m test-rc-2-z4bwm 1/1 Running 0 25s test-rc-3-b65xn 1/1 Running 0 22s test-rc-3-k8jfs 1/1 Running 0 1m test-rc-3-msnzc 1/1 Running 0 22s test-rc-4-hf2xz 1/1 Running 0 48s # oc get route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route1 route1-d1.apps.1030-0ou.qe.rhcloud.com service-unsecure(20%),service-unsecure-2(10%),service-unsecure-3(30%),service-unsecure-4(40%) http None # for i in {1..20} ; do curl route1-d1.apps.1030-0ou.qe.rhcloud.com; done Hello-OpenShift-4 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-4 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-3 http-8080 Hello-OpenShift-4 http-8080 Hello-OpenShift-3 http-8080 Hello-OpenShift-4 http-8080 Hello-OpenShift-3 http-8080 Hello-OpenShift-2 http-8080 Hello-OpenShift-4 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-4 http-8080 Hello-OpenShift-1 http-8080 Hello-OpenShift-3 http-8080 Hello-OpenShift-4 http-8080 Hello-OpenShift-3 http-8080 Hello-OpenShift-4 http-8080 Hello-OpenShift-3 http-8080 Hello-OpenShift-2 http-8080 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 Commit pushed to master at https://github.com/openshift/openshift-docs https://github.com/openshift/openshift-docs/commit/591a32b03fcba3ca63c4e6950ed5946965056c55 Router - changes to A/B weights discussion Bug: 1477685 https://bugzilla.redhat.com/show_bug.cgi?id=1477685 Trello: BhWCH3vu https://trello.com/c/BhWCH3vu/543-3-fix-problem-in-a-b-weight-algorithm |