Bug 1477685
| Summary: | A/B deployment seems to round-robin across all pods in multiple services, instead of proportional routing to services | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Phil Cameron <pcameron> |
| Component: | Networking | Assignee: | Phil Cameron <pcameron> |
| Networking sub component: | router | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aos-bugs, atragler, bbennett, dneary, eparis, rkhan, sukulkar, xtian, yadu, zzhao |
| Version: | 3.7.0 | Keywords: | Reopened |
| Target Milestone: | --- | ||
| Target Release: | 3.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: See Docs PR 5816
Consequence:
Fix:
Result:
|
Story Points: | --- |
| Clone Of: | 1470350 | Environment: | |
| Last Closed: | 2017-11-28 22:06:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1470350 | ||
| Bug Blocks: | 1473736 | ||
|
Description
Phil Cameron
2017-08-02 15:55:57 UTC
See PR 15970 for details. We decided to not continue with these changes. Scaling the number of pods, as documented, gets past the problem of having too many pods with minimum weight of 1. If there is a customer need we will re-open this bug. Thanks for the update - I read through https://github.com/openshift/origin/pull/15970 and that seems to be addressing a different issue (or at least, I don't see the relation to this bug report). No need to re-open. Commits pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/00d2d715129ee3e6e09d0713f68181acf6ac89f3 Router - A/B weights distribution improvement Service weight is roughly distributed among the endpoints. The service can end up with more or less weight than desired. This change finds the desired endpoint weight for each service and then scales the maximum weight to 256 (the maximum). The scaling is done before writing the haproxy.config file to account for any endpoint changes. bug: 1477685 https://bugzilla.redhat.com/show_bug.cgi?id=1477685 Trello: BhWCH3vu https://trello.com/c/BhWCH3vu/543-3-fix-problem-in-a-b-weight-algorithm https://github.com/openshift/origin/commit/296e98abb6a9938112db9e1f9d99b6cdbc5f7db7 Merge pull request #16090 from pecameron/bz1477685a Automatic merge from submit-queue (batch tested with PRs 16896, 16908, 16935, 16898, 16090). Router - A/B weights distribution improvement Service weight is roughly distributed among the endpoints. The service can end up with more or less weight than desired. This change finds the desired endpoint weight for each service and then scales the maximum weight to 256 (the maximum). The scaling is done before writing the haproxy.config file to account for any endpoint changes. bug: 1477685 https://bugzilla.redhat.com/show_bug.cgi?id=1477685 Trello: BhWCH3vu https://trello.com/c/BhWCH3vu/543-3-fix-problem-in-a-b-weight-algorithm As part of the new A/B load balancing algorithm change, PR 16090 also fixes a problem where an endpoint change is not reflected in the router configuration. This can lead to 503 responses even though there is an endpoint on the service. Test on OCP 3.7, issue have been fixed. openshift v3.7.0-0.174.0 kubernetes v1.7.6+a08f5eeb62 Test on latest OCP
openshift v3.7.0-0.184.0
kubernetes v1.7.6+a08f5eeb62
Verify steps are in bug description.
Actual result: Each endpoint gets weight/numberOfEndpoints portion of the requests
# oc get pod
NAME READY STATUS RESTARTS AGE
test-rc-1-5dthx 1/1 Running 0 30s
test-rc-1-pdghh 1/1 Running 0 1m
test-rc-2-6dfzv 1/1 Running 0 25s
test-rc-2-qgmgg 1/1 Running 0 25s
test-rc-2-tx7ws 1/1 Running 0 1m
test-rc-2-z4bwm 1/1 Running 0 25s
test-rc-3-b65xn 1/1 Running 0 22s
test-rc-3-k8jfs 1/1 Running 0 1m
test-rc-3-msnzc 1/1 Running 0 22s
test-rc-4-hf2xz 1/1 Running 0 48s
# oc get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route1 route1-d1.apps.1030-0ou.qe.rhcloud.com service-unsecure(20%),service-unsecure-2(10%),service-unsecure-3(30%),service-unsecure-4(40%) http None
# for i in {1..20} ; do curl route1-d1.apps.1030-0ou.qe.rhcloud.com; done
Hello-OpenShift-4 http-8080
Hello-OpenShift-1 http-8080
Hello-OpenShift-4 http-8080
Hello-OpenShift-1 http-8080
Hello-OpenShift-3 http-8080
Hello-OpenShift-4 http-8080
Hello-OpenShift-3 http-8080
Hello-OpenShift-4 http-8080
Hello-OpenShift-3 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-4 http-8080
Hello-OpenShift-1 http-8080
Hello-OpenShift-4 http-8080
Hello-OpenShift-1 http-8080
Hello-OpenShift-3 http-8080
Hello-OpenShift-4 http-8080
Hello-OpenShift-3 http-8080
Hello-OpenShift-4 http-8080
Hello-OpenShift-3 http-8080
Hello-OpenShift-2 http-8080
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 Commit pushed to master at https://github.com/openshift/openshift-docs https://github.com/openshift/openshift-docs/commit/591a32b03fcba3ca63c4e6950ed5946965056c55 Router - changes to A/B weights discussion Bug: 1477685 https://bugzilla.redhat.com/show_bug.cgi?id=1477685 Trello: BhWCH3vu https://trello.com/c/BhWCH3vu/543-3-fix-problem-in-a-b-weight-algorithm |