Hide Forgot
1. Proposed title of this feature request OpenShift router drainstop before terminating pods 3. What is the nature and description of the request? How can we do drainstop style functionality when we reduce the number of pods. Tell HA Proxy that we are reducing the number of Pods deployed. We'd like HA Proxy to hold on to those connections until they are all terminated or timed out to the POD being shut down. We found that when we scaled Pods up or down, that HAProxy would terminate the connections (which caused 503 errors on the client) and shift to new pods. We would like to know if there is a way to “drain” the traffic, prior to killing the pods to keep this interruption from happening 4. Why does the customer need this? (List the business requirements here) This would allow for a controlled handoff on the client side and avoid the disconnects/503 errors that appear to be happening during this process. 5. How would the customer like to achieve this? (List the functional requirements here) 1) indicate to openshift we want to scale down 2) haproxy stops scheduling new requests to the pods that will be scalled down 3) the pod/application has the ability to close out any open, inflight connections 4) the pod signals an "all clear" 5) openshift kills off the pods Please see this link: http://cbonte.github.io/haproxy-dconv/1.6/management.html#9.2-set%20server for what we are looking for (at a high level). HAProxy appears to support a "drain" functionality that would allow OpenShift to signal that a POD will be scaled down, the POD would be removed from HAProxy load balancing, and then the app could be signaled that it will be terminating. It could finish up any existing connections and close the HHTP connection, triggering the remote client to hit the HAProxy load balancer again and get a new connection to a Pod that is not terminating. This is similar to the A/B feature but to get the exact feature with A/B would require scripting a slow progression from one to another
This is pretty much what we do today. When a route change happens we write the current state of the pods to a config file and tell haproxy to gracefully reload. What that does is start a new process, which then reads the config and gets ready to serve. When it's ready it binds to the port and tells the old process to stop listening for new connections. At that point any new connection will hit the new router. However, any existing connection will continue to be served by the old haproxy process. The old process will stay around until all of the connections it is handling are closed (by either end, but not by the proxy) OR if any of the haproxy timeouts are passed (in which case the proxy will terminate that connection). When all connections are done, the old process will exit. You can tune the timeouts for the router in "Configuration Parameters": https://docs.openshift.com/container-platform/3.3/architecture/core_concepts/routes.html#haproxy-template-router Based on all of this, if you are scaling up then you should never see a connection be dropped. If you are, we need to get a reproducer for that. When you scale down, connections to the pod that has gone will obviously fail, but the others should remain. You can set the grace period if you need the connections to stay around after the pod has been taken out of the working set.
The customer is mostly concerned about when scaling pods down. They are hoping to be able to scale pods down without any loss of service. So if you have 10 pods and scale down to 9, the last pod will not drop connections. They are aware of using grace periods and timeouts but were hoping for a feature that simply waits for the connection to close "naturally". Not sure how feasible it is to implement this in haproxy.
Ah, I see. The pod does get signalled to terminate, so you want it to pay attention to that signal and terminate when all children stop handling connections so it can terminate sooner than a grace period if all connections end.
This bug has been identified as a dated (created more than 3 months ago) bug. This bug has been triaged (has a trello card linked to it), or reviewed by Engineering/PM and has been put into the product backlog, however this bug has not been slated for a currently planned release (3.9, 3.10 or 3.11), which cover our releases for the rest of the calendar year. As a result of this bugs age, state on the current roadmap and PM Score (being below 70), this bug is being Closed - Differed, as it is currently not part of the products immediate priorities. Please see: https://docs.google.com/document/d/1zdqF4rB3ea8GmVIZ7qWCVYUaQ7-EexUrQEF0MTwdDkw/edit for more details.