Bug 1584701 - HAProxy kills existing connections for A/B testing instead of redirecting only new traffic to the new instances
Summary: HAProxy kills existing connections for A/B testing instead of redirecting onl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.0
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-31 13:13 UTC by François Cami
Modified: 2018-10-11 07:20 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-11 07:20:02 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Origin (Github) 19893 None None None 2018-05-31 14:24:33 UTC
Red Hat Product Errata RHBA-2018:2652 None None None 2018-10-11 07:20:30 UTC

Description François Cami 2018-05-31 13:13:16 UTC
Description of problem:

Using an OpenShift route with several services (for A/B
testing), and the "oc set route-backends", setting a zero weigth to a
service removes the endpoints from the haproxy backend.
This results in a loss of all existing connections, and a loss of
associated end-users sessions.

This can be traced to the following code in origin/images/router/haproxy/conf/haproxy-config.template:

379-  {{- range $serviceUnitName, $weight := $cfg.ServiceUnitNames }}
380:    {{- if ne $weight 0 }}
381-      {{- with $serviceUnit := index $.ServiceUnits $serviceUnitName }}

This can be implemented by changing:
      {{ if ne $weight 0 }}
to
      {{ if ge $weight 0 }}
in the haproxy template.

Removing older services is then done whith the  "set route-backend" command.

Version-Release number of selected component (if applicable):
The code above is present in https://github.com/openshift/origin/ as of 88e726d2fb56c8943cd2a4d08e66e9c0497477f1.

Comment 3 François Cami 2018-05-31 13:56:56 UTC
PR: https://github.com/openshift/origin/pull/19893

Comment 4 Ben Bennett 2018-05-31 14:10:36 UTC
This is reasonable for the ones where we can see the HTTP headers and dispatch the existing connections to the endpoints with weight 0 using cookies.

We need to think more about the pass-through case.  It probably should drop the connections with weight 0 since those will only be able to be dispatched by round-robin (where the weight will be skipped) or src-IP (where the weight will be ignored).  So if r-r there is no point to add them, and if src-ip it does the wrong thing.

Comment 5 Ben Bennett 2018-05-31 14:23:05 UTC
Here's more info on the weights for the servers:
 https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4.2-balance
 https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#5.2-weight
 https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#hash-type

  source      The source IP address is hashed and divided by the total
              weight of the running servers to designate which server will
              receive the request. This ensures that the same client IP
              address will always reach the same server as long as no
              server goes down or up. If the hash result changes due to the
              number of running servers changing, many clients will be
              directed to a different server. This algorithm is generally
              used in TCP mode where no cookie may be inserted. It may also
              be used on the Internet to provide a best-effort stickiness
              to clients which refuse session cookies. This algorithm is
              static by default, which means that changing a server's
              weight on the fly will have no effect, but this can be
              changed using "hash-type".

We use the consistent hash type for tcp connections... and that supports weights.

BUT I don't see what you would gain because new tcp connections should not use the ones with weight 0, and ongoing connections are left until they close.

Comment 6 François Cami 2018-07-26 07:16:13 UTC
PR https://github.com/openshift/origin/pull/19893 is merged.

Comment 7 François Cami 2018-07-26 07:16:48 UTC
@zhaozhanqi do you need any information from me on how to test this?

Comment 8 zhaozhanqi 2018-07-26 08:45:27 UTC
@François Cami

Thank you for asking me this.

I plan to using the followin steps:

1. 'ab' to concurrent requests
2.  During that I set a zero weigth to a service removes the endpoints from the haproxy backend 
3. Check the left requests can work well.

please correct me if it's NOT enough or if you have a better way.
thank you in advance.

Comment 9 François Cami 2018-07-26 13:51:38 UTC
@zhaozhanqi this looks good - make sure that at step 3 existing connections still go to endpoints with 0 weight.

Comment 10 zhaozhanqi 2018-07-27 05:54:17 UTC
Tested this bug on v3.11.0-0.9.0 with haproxy image(e9a8a335a0970)

this issue has been fix, please help modify the statue to 'ON_QA', I will verify this bug

Comment 11 Franck Grosjean 2018-07-27 06:03:35 UTC
@zhaozhanqi done on behalf of @fcami

Comment 13 zhaozhanqi 2018-07-27 08:44:47 UTC
Verified this bug according to comment 10 with steps in comment 8.

Comment 14 Michael Burke 2018-09-13 14:46:18 UTC
Ben, Francois --

It seems like this change should this change go into the 3.11+ docs? I did a quick search and did not see a docs BZ, perhaps I missed it.

"Setting `oc set route-backend` to 0 means the server will not participate in load-balancing but will still accept persistent connections."

Michael

Comment 15 François Cami 2018-09-13 14:51:42 UTC
Michael - my fault, I should have done it as well.
I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1628615

Comment 16 Michael Burke 2018-09-13 15:01:14 UTC
Francois Thank you!

Comment 18 errata-xmlrpc 2018-10-11 07:20:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.