Hide Forgot
Description of problem: Using an OpenShift route with several services (for A/B testing), and the "oc set route-backends", setting a zero weigth to a service removes the endpoints from the haproxy backend. This results in a loss of all existing connections, and a loss of associated end-users sessions. This can be traced to the following code in origin/images/router/haproxy/conf/haproxy-config.template: 379- {{- range $serviceUnitName, $weight := $cfg.ServiceUnitNames }} 380: {{- if ne $weight 0 }} 381- {{- with $serviceUnit := index $.ServiceUnits $serviceUnitName }} This can be implemented by changing: {{ if ne $weight 0 }} to {{ if ge $weight 0 }} in the haproxy template. Removing older services is then done whith the "set route-backend" command. Version-Release number of selected component (if applicable): The code above is present in https://github.com/openshift/origin/ as of 88e726d2fb56c8943cd2a4d08e66e9c0497477f1.
PR: https://github.com/openshift/origin/pull/19893
This is reasonable for the ones where we can see the HTTP headers and dispatch the existing connections to the endpoints with weight 0 using cookies. We need to think more about the pass-through case. It probably should drop the connections with weight 0 since those will only be able to be dispatched by round-robin (where the weight will be skipped) or src-IP (where the weight will be ignored). So if r-r there is no point to add them, and if src-ip it does the wrong thing.
Here's more info on the weights for the servers: https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4.2-balance https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#5.2-weight https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#hash-type source The source IP address is hashed and divided by the total weight of the running servers to designate which server will receive the request. This ensures that the same client IP address will always reach the same server as long as no server goes down or up. If the hash result changes due to the number of running servers changing, many clients will be directed to a different server. This algorithm is generally used in TCP mode where no cookie may be inserted. It may also be used on the Internet to provide a best-effort stickiness to clients which refuse session cookies. This algorithm is static by default, which means that changing a server's weight on the fly will have no effect, but this can be changed using "hash-type". We use the consistent hash type for tcp connections... and that supports weights. BUT I don't see what you would gain because new tcp connections should not use the ones with weight 0, and ongoing connections are left until they close.
PR https://github.com/openshift/origin/pull/19893 is merged.
@zhaozhanqi do you need any information from me on how to test this?
@François Cami Thank you for asking me this. I plan to using the followin steps: 1. 'ab' to concurrent requests 2. During that I set a zero weigth to a service removes the endpoints from the haproxy backend 3. Check the left requests can work well. please correct me if it's NOT enough or if you have a better way. thank you in advance.
@zhaozhanqi this looks good - make sure that at step 3 existing connections still go to endpoints with 0 weight.
Tested this bug on v3.11.0-0.9.0 with haproxy image(e9a8a335a0970) this issue has been fix, please help modify the statue to 'ON_QA', I will verify this bug
@zhaozhanqi done on behalf of @fcami
Verified this bug according to comment 10 with steps in comment 8.
Ben, Francois -- It seems like this change should this change go into the 3.11+ docs? I did a quick search and did not see a docs BZ, perhaps I missed it. "Setting `oc set route-backend` to 0 means the server will not participate in load-balancing but will still accept persistent connections." Michael
Michael - my fault, I should have done it as well. I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1628615
Francois Thank you!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652