| Summary: | [dev-preview-stg] Routing to backends is not accurate as the weight we set in route sometimes | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Yan Du <yadu> |
| Component: | Networking | Assignee: | Rajat Chopra <rchopra> |
| Networking sub component: | router | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | low | ||
| Priority: | medium | CC: | aos-bugs, bbennett, weliang, yadu, zhaliu |
| Version: | 3.3.0 | Keywords: | Reopened |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-10-28 15:23:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Yan Du
2016-09-29 09:18:06 UTC
Those are awfully close to the weights both times... what's the problem? Just tried same test steps in my env, get perfect results as expected:
[root@dhcp-41-178 ~]# for i in {1..100}; do curl --resolve unsecure-route-https.router.default.svc.cluster.local:80:10.18.41.181 http://unsecure-route-https.router.default.svc.cluster.local>> a.log; done
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-1' | wc -l
20
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-2' | wc -l
30
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-3' | wc -l
50
[root@dhcp-41-178 ~]# for i in {1..100}; do curl --resolve unsecure-route-https.router.default.svc.cluster.local:80:10.18.41.181 http://unsecure-route-https.router.default.svc.cluster.local>> a.log; done
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-1' | wc -l
40
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-2' | wc -l
60
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-3' | wc -l
100
[root@dhcp-41-178 ~]# for i in {1..100}; do curl --resolve unsecure-route-https.router.default.svc.cluster.local:80:10.18.41.181 http://unsecure-route-https.router.default.svc.cluster.local>> a.log; done
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-1' | wc -l
60
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-2' | wc -l
90
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-3' | wc -l
150
[root@dhcp-41-178 ~]#
Thanks Weibin. I don't see a bug here. I still met the issue in latest dev-preview-stg env (3.3.1.3), it is more easier to reproduce the issue on stg env than ose env
[root@yanshost jsonfile]# oc set route-backends unsecure-route
NAME KIND TO WEIGHT
routes/unsecure-route Service service-unsecure 20 (20%)
routes/unsecure-route Service service-unsecure-2 80 (80%)
[root@yanshost jsonfile]# for i in {1..10}; do curl http://unsecure-route-d1.b795.dev-preview-stg.openshiftapps.com ; done
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-1 http-8080
Hello-OpenShift-2 http-8080
[root@yanshost jsonfile]# for i in {1..10}; do curl http://unsecure-route-d1.b795.dev-preview-stg.openshiftapps.com ; done
Hello-OpenShift-1 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-1 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-1 http-8080
[root@yanshost jsonfile]# for i in {1..10}; do curl http://unsecure-route-d1.b795.dev-preview-stg.openshiftapps.com ; done
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-1 http-8080
Hello-OpenShift-2 http-8080
[root@yanshost jsonfile]# for i in {1..10}; do curl http://unsecure-route-d1.b795.dev-preview-stg.openshiftapps.com ; done
Hello-OpenShift-1 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-1 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-1 http-8080
For two backends, if I send 100 requests, sometimes I got below result [root@yanshost jsonfile]# oc set route-backends unsecure-route NAME KIND TO WEIGHT routes/unsecure-route Service service-unsecure 20 (20%) routes/unsecure-route Service service-unsecure-2 80 (80% [root@yanshost jsonfile]# cat a.log | grep 'Hello-OpenShift-1' | wc -l 18 [root@yanshost jsonfile]# cat a.log | grep 'Hello-OpenShift-2' | wc -l 82 For multiple backends: [root@yanshost jsonfile]# oc set route-backends unsecure-route NAME KIND TO WEIGHT routes/unsecure-route Service service-unsecure 2 (20%) routes/unsecure-route Service service-unsecure-2 3 (30%) routes/unsecure-route Service service-unsecure-3 5 (50%) [root@yanshost jsonfile]# cat d.log | grep 'Hello-OpenShift-3' | wc -l 52 [root@yanshost jsonfile]# cat d.log | grep 'Hello-OpenShift-2' | wc -l 30 [root@yanshost jsonfile]# cat d.log | grep 'Hello-OpenShift-1' | wc -l 18 btw, it is closed to the weight we set but not accurate, and the bug is not 100% reproducibility, feel free to contact me if you can't reproduce it. Thanks Yan: Can you pull the stats from the router to see what they show for the proportions? I also contend that the weights won't be perfect, but they should be within a few percent, as we see. But I'll look into that after we get the stats. Doesn't the stage env have multiple routers? That may explain the slight discrepancies. Checking the stats would be the right way. The logs from this particular router do suggest 20%/80% exact split. It is possible that multiple routers do cause a slight variation (the weighted roundrobin resets its counters on a reload). I would consider this bug NOT_A_BUG To be sure, please include stats logs from all the routers. |