Bug 1416869

Summary: Split Traffic Routes Not Balancing as Expected
Product: OpenShift Container Platform Reporter: Nick Schuetz <nschuetz>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: router QA Contact: Yan Du <yadu>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, bbennett, bmeng, ccoleman, jokerman, mmccomas, nschuetz, weliang
Version: 3.4.0   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Feature: Make the default for routes with multiple active services be round-robin to match what users expect Reason: Without this, people need to also set an annotation on a route as well as weights to make it behave correctly. This is surprising and people rarely got it right. Result: Now the default behaves as people expect.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-12 19:10:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Split Traffic UI Screenshot none

Description Nick Schuetz 2017-01-26 16:27:47 UTC
Description of problem:

I've setup a split traffic route that does not seem to be balancing traffic as expected. I have "App A" with a weight of 1 and "App B" with a weight of 3. In the UI the pie charge has this as 3/4 of traffic is supposed to go to the App B servuce and 1/4 is supposed to go to App A service. When I hit the route, I see a 50/50 split consistently.

Version-Release number of selected component (if applicable):

3.4.0.40

How reproducible:

Always

Steps to Reproduce:
1. Create a split route between two different services and hit said route with a browser or curl with cookies disabled.
2.
3.

Actual results:

Traffic is split 50/50

Expected results:

Traffic is split 25/75

Additional info:

When I up the pods to more that one on App A (25%er), eight for example, then traffic is balanced to App A the majority of the time. It's definately not 25% of the time as would be expected.

Comment 1 Nick Schuetz 2017-01-26 16:31:56 UTC
Created attachment 1244807 [details]
Split Traffic UI Screenshot

Comment 2 Ben Bennett 2017-01-26 19:01:27 UTC
Did you set the balance algorithm annotation to round-robin on the route?  It sounds like you are getting the default least connections behavior.

Comment 4 Nick Schuetz 2017-01-26 19:14:23 UTC
The only annotation attached to that route is: openshift.io/host.generated=true

Comment 6 Ben Bennett 2017-01-27 19:48:51 UTC
The web console either needs to set, or allow to be set, the route balance algorithm.

The route annotation haproxy.router.openshift.io/balance needs to be set to roundrobin if you want the weights to have any effect.

https://docs.openshift.com/container-platform/3.4/architecture/core_concepts/routes.html#route-specific-annotations

Comment 7 Samuel Padgett 2017-01-30 13:33:17 UTC
Ben, can the router default to round robin if there is a route has alternate backends? If the user sets weights, it seems clear that they want them to work...

Is this also a problem with `oc set route-backends`?

Even if working as intended, I see it as a usability bug. It's really not obvious you need to set the annotation. Even if the web console starts adding the annotation, it won't fix any existing routes created from the web console.

At the very least, we need to mention the annotation in the topic on a/b testing:

https://docs.openshift.org/latest/dev_guide/routes.html#routes-load-balancing-for-AB-testing

Comment 8 Samuel Padgett 2017-01-31 14:24:56 UTC
The better fix is to have the router default to round-robin for routes that have alternate backends and no `haproxy.router.openshift.io/balance` annotation set.

I've confirmed `oc set route-backends` also does not add this annotation, so the problem is not specific to the web console. Updating the web console will not fix any existing routes.

Comment 9 Ben Bennett 2017-01-31 14:35:40 UTC
Ok, that's a reasonable argument.  I'll see if we can change the template to accomodate.

Comment 10 Nick Schuetz 2017-01-31 15:33:59 UTC
Also note that if service A (25%) has 4 pods and service B (75%) has 1, the traffic is no longer balanced at 25/75. The percentage that is asked for by the user is no longer valid. This could be problematic and confusing to the end user when using multiple pods under their services.

Comment 11 openshift-github-bot 2017-02-03 19:06:09 UTC
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/0ddebc1a464010ae5919120366d7be3500af00c5
Changed the router to default to roundrobin with multiple services

If the route is associated with multiple services then we will set the
default load balance policy to RoundRobin if no policy is set with an
annotation or as a global default with an environment variable.
Without this change the user would need to both set the services,
weights, and then set an annotation to change the default balancing
algorithm... which people almost always forgot to do.

For bug 1416869 (https://bugzilla.redhat.com/show_bug.cgi?id=1416869)

Comment 12 Troy Dawson 2017-02-06 17:25:00 UTC
This has been merged into ocp and is in OCP v3.5.0.17 or newer.

Comment 14 Yan Du 2017-02-07 10:36:32 UTC
I just test on latest OCP env v3.5.0.17
ose-haproxy-router    v3.5.0.17           6a86c1d87ea7

Seems the image doesn't contain the code change. Pls. help to check it

Comment 15 Troy Dawson 2017-02-07 22:34:28 UTC
Sorry about that.  Still working through issues with our merge script.  Should be in the next build.

Comment 16 Troy Dawson 2017-02-08 22:34:44 UTC
This has been merged into ocp and is in OCP v3.5.0.18 or newer.  I made sure of it this time.

Comment 17 Yan Du 2017-02-09 03:19:55 UTC
Test on latest OCP v3.5.0.18
ose-haproxy-router    v3.5.0.18           109538c1aad4
The route with multiple service will set load balance policy to RoundRobin by default. Move bug to verified.

Thanks, @Troy

Comment 18 openshift-github-bot 2017-02-09 15:08:17 UTC
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/a4815c6314f9df1d2ce8060216d0924181c48b6c
Changed the router default to roundrobin if non-zero weights are used

If the route has non-zero weights set for the services it is
associated with, then we will set the default load balance policy to
RoundRobin if no policy is set with an annotation or as a global
default with an environment variable. Without this change the user
would need to both set the weights, and then set an annotation to
change the default balancing algorithm... which people almost always
forgot to do.

For bug 1416869 (https://bugzilla.redhat.com/show_bug.cgi?id=1416869)

Comment 19 Ben Bennett 2017-02-09 15:15:03 UTC
Weibin: Can you verify that the test cases handle the following cases:
  1 route with two services with weights set to 1 and 9, one service gets 10% the other gets 90%.  No annotations should be set on the route.

 1 route with two services with weights set to 0 and 1.  The 1 route gets 100% of the traffic.  (No annotations should be set)

The other test I want is for roundrobin vs leastconn.  In the previous case with a service with weight 0, we should be using leastconn.  As soon as the weight becomes non-zero we should swap to roundrobin in the generated haproxy.conf in the router.

Comment 20 Weibin Liang 2017-02-09 18:46:22 UTC
(In reply to Ben Bennett from comment #19)
> Weibin: Can you verify that the test cases handle the following cases:
>   1 route with two services with weights set to 1 and 9, one service gets
> 10% the other gets 90%.  No annotations should be set on the route.

weliang: Test passed as expected.

>  1 route with two services with weights set to 0 and 1.  The 1 route gets
> 100% of the traffic.  (No annotations should be set)

weliang: Test passed as expected.

> The other test I want is for roundrobin vs leastconn.  In the previous case
> with a service with weight 0, we should be using leastconn.  As soon as the
> weight becomes non-zero we should swap to roundrobin in the generated
> haproxy.conf in the router.

weliang: still use roundrobin not leastconn

[root@ip-172-18-9-212 ~]# oc get route
NAME               HOST/PORT                                          PATH      SERVICES                                        PORT               TERMINATION   WILDCARD
docker-registry    docker-registry-default.0209-l4a.qe.rhcloud.com              docker-registry                                 5000-tcp           passthrough   None
registry-console   registry-console-default.0209-l4a.qe.rhcloud.com             registry-console                                registry-console   passthrough   None
unsecure-route     unsecure-route-default.0209-l4a.qe.rhcloud.com               service-unsecure(100%),service-unsecure-2(0%)   http                             None
[root@ip-172-18-9-212 ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
caddy-docker               1/1       Running   0          2m
caddy-docker-2             1/1       Running   0          2m
caddy-docker-3             1/1       Running   0          2m
docker-registry-2-857b4    1/1       Running   0          2h
docker-registry-2-n6b4h    1/1       Running   2          2h
registry-console-1-rfhch   1/1       Running   0          2h
router-1-3hvt4             1/1       Running   0          2h
router-1-n4b4p             1/1       Running   0          2h
[root@ip-172-18-9-212 ~]# oc rsh router-1-3hvt4
sh-4.2$ more haproxy.config 
-----------
# Plain http backend
backend be_http_default_unsecure-route
    
  mode http
  option redispatch
  option forwardfor
    
      
  balance roundrobin
      
-----------------------

Comment 21 Ben Bennett 2017-02-09 18:50:36 UTC
Weibin: Thanks.  You need the follow-on PR that landed this morning in order to get the proper zero-weight behavior.

Comment 22 Weibin Liang 2017-02-09 20:30:12 UTC
(In reply to Ben Bennett from comment #21)
> Weibin: Thanks.  You need the follow-on PR that landed this morning in order
> to get the proper zero-weight behavior.

Test results are:

Case1: with a service with weight 0, roundrobin is shown in haproxy.conf

Case2: Without setting the weight for a service, leastconn is shown in haproxy.conf.

Case3, Fellowing case2, then set the service with non-zero weight, roundrobin is shown in haproxy.conf

Tested in oc v3.5.0.18+9a5d1aa

Comment 23 Ben Bennett 2017-02-09 20:44:55 UTC
Super.  Thanks!

Comment 24 Troy Dawson 2017-02-10 22:51:33 UTC
This has been merged into ocp and is in OCP v3.5.0.19 or newer.

Comment 25 Yan Du 2017-02-13 05:34:09 UTC
The scenarios in the comments are works well with latest OCP v3.5.0.19+199197c

Comment 26 Yan Du 2017-03-15 03:05:18 UTC
Hi, Ben

I just tested on OCP v3.5.0.52 env and found the zero weight behaviour is a little different from before(#comment 22 and 25)

1) with a service with weight 0, leastconn is shown in haproxy.conf
$ oc set route-backends route1
NAME           KIND     TO                  WEIGHT
routes/route1  Service  service-unsecure    0 (0%)
routes/route1  Service  service-unsecure-2  1 (100%)

# Plain http backend
backend be_http_d1_route1   
  mode http
  option redispatch
  option forwardfor      
  balance leastconn

Could you please help to confirm that which behaviour is the proper one? Thanks

Comment 28 errata-xmlrpc 2017-04-12 19:10:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884

Comment 29 Ben Bennett 2018-01-08 19:08:26 UTC
@yandu: https://github.com/openshift/origin/pull/12828 added more fixes, so the correct behavior is that we use LeastConn until there are >1 routes with non-zero weights in which case we use RoundRobin.