| Summary: | [RFE] OpenShift router drainstop before terminating pods | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Steven Walter <stwalter> |
| Component: | RFE | Assignee: | Ben Bennett <bbennett> |
| Status: | CLOSED DEFERRED | QA Contact: | Meng Bo <bmeng> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.2.0 | CC: | aos-bugs, bbennett, bmeng, eparis, jokerman, mbarrett, mmccomas, mrobson, rbost, stwalter |
| Target Milestone: | --- | Keywords: | RFE |
| Target Release: | --- | Flags: | erich:
needinfo?
(stwalter) |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-03-12 13:54:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Steven Walter
2016-10-19 14:10:40 UTC
This is pretty much what we do today. When a route change happens we write the current state of the pods to a config file and tell haproxy to gracefully reload. What that does is start a new process, which then reads the config and gets ready to serve. When it's ready it binds to the port and tells the old process to stop listening for new connections. At that point any new connection will hit the new router. However, any existing connection will continue to be served by the old haproxy process. The old process will stay around until all of the connections it is handling are closed (by either end, but not by the proxy) OR if any of the haproxy timeouts are passed (in which case the proxy will terminate that connection). When all connections are done, the old process will exit. You can tune the timeouts for the router in "Configuration Parameters": https://docs.openshift.com/container-platform/3.3/architecture/core_concepts/routes.html#haproxy-template-router Based on all of this, if you are scaling up then you should never see a connection be dropped. If you are, we need to get a reproducer for that. When you scale down, connections to the pod that has gone will obviously fail, but the others should remain. You can set the grace period if you need the connections to stay around after the pod has been taken out of the working set. The customer is mostly concerned about when scaling pods down. They are hoping to be able to scale pods down without any loss of service. So if you have 10 pods and scale down to 9, the last pod will not drop connections. They are aware of using grace periods and timeouts but were hoping for a feature that simply waits for the connection to close "naturally". Not sure how feasible it is to implement this in haproxy. Ah, I see. The pod does get signalled to terminate, so you want it to pay attention to that signal and terminate when all children stop handling connections so it can terminate sooner than a grace period if all connections end. This bug has been identified as a dated (created more than 3 months ago) bug. This bug has been triaged (has a trello card linked to it), or reviewed by Engineering/PM and has been put into the product backlog, however this bug has not been slated for a currently planned release (3.9, 3.10 or 3.11), which cover our releases for the rest of the calendar year. As a result of this bugs age, state on the current roadmap and PM Score (being below 70), this bug is being Closed - Differed, as it is currently not part of the products immediate priorities. Please see: https://docs.google.com/document/d/1zdqF4rB3ea8GmVIZ7qWCVYUaQ7-EexUrQEF0MTwdDkw/edit for more details. |