Bug 1380307 - [dev-preview-stg] Routing to backends is not accurate as the weight we set in route sometimes
Summary: [dev-preview-stg] Routing to backends is not accurate as the weight we set in...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: ---
Assignee: Rajat Chopra
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-29 09:18 UTC by Yan Du
Modified: 2022-08-04 22:20 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-28 15:23:19 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Yan Du 2016-09-29 09:18:06 UTC
Description of problem:
Create a route with multiple backends, then set the weight for the backends, the weight it is not accurate when access the route


Version-Release number of selected component (if applicable):
dev-preview-stg
atomic-openshift-3.3.0.33-1.git.0.8601ee7.el7.x86_64
docker-1.10.3-46.el7.14.x86_64
kernel-3.10.0-327.36.1.el7.x86_64

How reproducible:
Sometimes


How reproducible:
always

Steps to Reproduce:

1. Create pods/services:
PodA & serviceA
# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/abrouting/caddy-docker.json
# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/abrouting/unseucre/service_unsecure.json
PodB & serviceB
# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/abrouting/caddy-docker-2.json
# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/abrouting/unseucre/service_unsecure-2.json
PodC & serviceC
# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/abrouting/caddy-docker-3.json
# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/abrouting/unseucre/service_unsecure-3.json

3. Create unsecure route
# oc expose svc service-unsecure --name=unsecure-route

4. Set the route to roundrobin mode
# oc annotate route unsecure-route --overwrite  haproxy.router.openshift.io/balance=roundrobin

5. Set backends weight for route
# oc set route-backends unsecure-route service-unsecure=2 service-unsecure-2=3 service-unsecure-3=5
# oc set route-backends unsecure-route
NAME                   KIND     TO                  WEIGHT
routes/unsecure-route  Service  service-unsecure    2 (20%)
routes/unsecure-route  Service  service-unsecure-2  3 (30%)
routes/unsecure-route  Service  service-unsecure-3  5 (50%)

6. Access the route 100 times
#for i in {1..100}; do curl unsecure-route-d1.b795.dev-preview-stg.openshiftapps.com >> a.log; done
# cat a.log | grep 'Hello-OpenShift-3' | wc -l
52
# cat a.log | grep 'Hello-OpenShift-2' | wc -l
31
# cat a.log | grep 'Hello-OpenShift-1' | wc -l
17

7. Access the route again
#for i in {1..100}; do curl unsecure-route-d1.b795.dev-preview-stg.openshiftapps.com >> b.log; done
# cat b.log | grep 'Hello-OpenShift-1' | wc -l
17
# cat b.log | grep 'Hello-OpenShift-2' | wc -l
32
# cat b.log | grep 'Hello-OpenShift-3' | wc -l
51


Actual results:
Refer to step6 and step7


Expected results:
Routing to backends should be same as the weight we set

Additional info:
Sometims issue could be reproduced with two backends too, but the reproducibility is higher when using multiple backends.

Comment 1 Ben Bennett 2016-10-14 19:48:00 UTC
Those are awfully close to the weights both times... what's the problem?

Comment 2 Weibin Liang 2016-10-17 15:38:01 UTC
Just tried same test steps in my env, get perfect results as expected:

[root@dhcp-41-178 ~]# for i in {1..100}; do curl --resolve unsecure-route-https.router.default.svc.cluster.local:80:10.18.41.181 http://unsecure-route-https.router.default.svc.cluster.local>> a.log; done
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-1' | wc -l
20
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-2' | wc -l
30
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-3' | wc -l
50

[root@dhcp-41-178 ~]# for i in {1..100}; do curl --resolve unsecure-route-https.router.default.svc.cluster.local:80:10.18.41.181 http://unsecure-route-https.router.default.svc.cluster.local>> a.log; done
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-1' | wc -l
40
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-2' | wc -l
60
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-3' | wc -l
100

[root@dhcp-41-178 ~]# for i in {1..100}; do curl --resolve unsecure-route-https.router.default.svc.cluster.local:80:10.18.41.181 http://unsecure-route-https.router.default.svc.cluster.local>> a.log; done

[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-1' | wc -l
60
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-2' | wc -l
90
[root@dhcp-41-178 ~]# cat a.log | grep 'Hello-OpenShift-3' | wc -l
150
[root@dhcp-41-178 ~]#

Comment 3 Ben Bennett 2016-10-17 18:25:12 UTC
Thanks Weibin.  I don't see a bug here.

Comment 4 Yan Du 2016-10-20 08:44:12 UTC
I still met the issue in latest dev-preview-stg env (3.3.1.3), it is more easier to reproduce the issue on stg env than ose env

[root@yanshost jsonfile]# oc set route-backends unsecure-route
NAME                   KIND     TO                  WEIGHT
routes/unsecure-route  Service  service-unsecure    20 (20%)
routes/unsecure-route  Service  service-unsecure-2  80 (80%)    
[root@yanshost jsonfile]# for i in {1..10}; do curl  http://unsecure-route-d1.b795.dev-preview-stg.openshiftapps.com ; done
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-1 http-8080
Hello-OpenShift-2 http-8080
[root@yanshost jsonfile]# for i in {1..10}; do curl  http://unsecure-route-d1.b795.dev-preview-stg.openshiftapps.com ; done
Hello-OpenShift-1 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-1 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-1 http-8080
[root@yanshost jsonfile]# for i in {1..10}; do curl  http://unsecure-route-d1.b795.dev-preview-stg.openshiftapps.com ; done
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-1 http-8080
Hello-OpenShift-2 http-8080
[root@yanshost jsonfile]# for i in {1..10}; do curl  http://unsecure-route-d1.b795.dev-preview-stg.openshiftapps.com ; done
Hello-OpenShift-1 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-1 http-8080
Hello-OpenShift-2 http-8080
Hello-OpenShift-1 http-8080

Comment 5 Yan Du 2016-10-20 09:23:09 UTC
For two backends, if I send 100 requests, sometimes I got below result

[root@yanshost jsonfile]# oc set route-backends unsecure-route
NAME                   KIND     TO                  WEIGHT
routes/unsecure-route  Service  service-unsecure    20 (20%)
routes/unsecure-route  Service  service-unsecure-2  80 (80%

[root@yanshost jsonfile]# cat a.log | grep 'Hello-OpenShift-1' | wc -l
18
[root@yanshost jsonfile]# cat a.log | grep 'Hello-OpenShift-2' | wc -l
82

For multiple backends:

[root@yanshost jsonfile]# oc set route-backends unsecure-route
NAME                   KIND     TO                  WEIGHT
routes/unsecure-route  Service  service-unsecure    2 (20%)
routes/unsecure-route  Service  service-unsecure-2  3 (30%)
routes/unsecure-route  Service  service-unsecure-3  5 (50%)

[root@yanshost jsonfile]# cat d.log | grep 'Hello-OpenShift-3' | wc -l
52
[root@yanshost jsonfile]# cat d.log | grep 'Hello-OpenShift-2' | wc -l
30
[root@yanshost jsonfile]# cat d.log | grep 'Hello-OpenShift-1' | wc -l
18

btw, it is closed to the weight we set but not accurate, and the bug is not 100% reproducibility, feel free to contact me if you can't reproduce it. Thanks

Comment 6 Ben Bennett 2016-10-20 13:18:34 UTC
Yan: Can you pull the stats from the router to see what they show for the proportions?

I also contend that the weights won't be perfect, but they should be within a few percent, as we see.  But I'll look into that after we get the stats.

Comment 7 Rajat Chopra 2016-10-20 14:51:06 UTC
Doesn't the stage env have multiple routers? That may explain the slight discrepancies.
Checking the stats would be the right way.

Comment 9 Rajat Chopra 2016-10-26 13:21:53 UTC
The logs from this particular router do suggest 20%/80% exact split. It is possible that multiple routers do cause a slight variation (the weighted roundrobin resets its counters on a reload).

I would consider this bug NOT_A_BUG
To be sure, please include stats logs from all the routers.


Note You need to log in before you can comment on or make changes to this bug.