Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1584701

Summary: HAProxy kills existing connections for A/B testing instead of redirecting only new traffic to the new instances
Product: OpenShift Container Platform Reporter: François Cami <fcami>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: router QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, bbennett, bperkins, dmace, fcami, fgrosjea, mburke, pasik, zzhao
Version: 3.9.0   
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:20:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description François Cami 2018-05-31 13:13:16 UTC
Description of problem:

Using an OpenShift route with several services (for A/B
testing), and the "oc set route-backends", setting a zero weigth to a
service removes the endpoints from the haproxy backend.
This results in a loss of all existing connections, and a loss of
associated end-users sessions.

This can be traced to the following code in origin/images/router/haproxy/conf/haproxy-config.template:

379-  {{- range $serviceUnitName, $weight := $cfg.ServiceUnitNames }}
380:    {{- if ne $weight 0 }}
381-      {{- with $serviceUnit := index $.ServiceUnits $serviceUnitName }}

This can be implemented by changing:
      {{ if ne $weight 0 }}
to
      {{ if ge $weight 0 }}
in the haproxy template.

Removing older services is then done whith the  "set route-backend" command.

Version-Release number of selected component (if applicable):
The code above is present in https://github.com/openshift/origin/ as of 88e726d2fb56c8943cd2a4d08e66e9c0497477f1.

Comment 3 François Cami 2018-05-31 13:56:56 UTC
PR: https://github.com/openshift/origin/pull/19893

Comment 4 Ben Bennett 2018-05-31 14:10:36 UTC
This is reasonable for the ones where we can see the HTTP headers and dispatch the existing connections to the endpoints with weight 0 using cookies.

We need to think more about the pass-through case.  It probably should drop the connections with weight 0 since those will only be able to be dispatched by round-robin (where the weight will be skipped) or src-IP (where the weight will be ignored).  So if r-r there is no point to add them, and if src-ip it does the wrong thing.

Comment 5 Ben Bennett 2018-05-31 14:23:05 UTC
Here's more info on the weights for the servers:
 https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4.2-balance
 https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#5.2-weight
 https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#hash-type

  source      The source IP address is hashed and divided by the total
              weight of the running servers to designate which server will
              receive the request. This ensures that the same client IP
              address will always reach the same server as long as no
              server goes down or up. If the hash result changes due to the
              number of running servers changing, many clients will be
              directed to a different server. This algorithm is generally
              used in TCP mode where no cookie may be inserted. It may also
              be used on the Internet to provide a best-effort stickiness
              to clients which refuse session cookies. This algorithm is
              static by default, which means that changing a server's
              weight on the fly will have no effect, but this can be
              changed using "hash-type".

We use the consistent hash type for tcp connections... and that supports weights.

BUT I don't see what you would gain because new tcp connections should not use the ones with weight 0, and ongoing connections are left until they close.

Comment 6 François Cami 2018-07-26 07:16:13 UTC
PR https://github.com/openshift/origin/pull/19893 is merged.

Comment 7 François Cami 2018-07-26 07:16:48 UTC
@zhaozhanqi do you need any information from me on how to test this?

Comment 8 zhaozhanqi 2018-07-26 08:45:27 UTC
@François Cami

Thank you for asking me this.

I plan to using the followin steps:

1. 'ab' to concurrent requests
2.  During that I set a zero weigth to a service removes the endpoints from the haproxy backend 
3. Check the left requests can work well.

please correct me if it's NOT enough or if you have a better way.
thank you in advance.

Comment 9 François Cami 2018-07-26 13:51:38 UTC
@zhaozhanqi this looks good - make sure that at step 3 existing connections still go to endpoints with 0 weight.

Comment 10 zhaozhanqi 2018-07-27 05:54:17 UTC
Tested this bug on v3.11.0-0.9.0 with haproxy image(e9a8a335a0970)

this issue has been fix, please help modify the statue to 'ON_QA', I will verify this bug

Comment 11 Franck Grosjean 2018-07-27 06:03:35 UTC
@zhaozhanqi done on behalf of @fcami

Comment 13 zhaozhanqi 2018-07-27 08:44:47 UTC
Verified this bug according to comment 10 with steps in comment 8.

Comment 14 Michael Burke 2018-09-13 14:46:18 UTC
Ben, Francois --

It seems like this change should this change go into the 3.11+ docs? I did a quick search and did not see a docs BZ, perhaps I missed it.

"Setting `oc set route-backend` to 0 means the server will not participate in load-balancing but will still accept persistent connections."

Michael

Comment 15 François Cami 2018-09-13 14:51:42 UTC
Michael - my fault, I should have done it as well.
I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1628615

Comment 16 Michael Burke 2018-09-13 15:01:14 UTC
Francois Thank you!

Comment 18 errata-xmlrpc 2018-10-11 07:20:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652