Bug 1320233
| Summary: | haproxy is reloaded every 10 minutes N-times for N endpoints | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Marek Schmidt <maschmid> |
| Component: | Networking | Assignee: | Maru Newby <mnewby> |
| Networking sub component: | router | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | urgent | CC: | adellape, ahoness, aos-bugs, bbennett, bmeng, bperkins, bvincell, eparis, hklein, jeder, jpechane, mifiedle, misalunk, pep, sdodson, stwalter, vlaad |
| Version: | 3.1.0 | ||
| Target Milestone: | --- | ||
| Target Release: | 3.2.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
When the default HAProxy router reloaded its configuration too often during a resync (default interval: 10 minutes), it was possible to experience dropped connections to routes. This bug fix updates the ose-haproxy-router image to limit reloads to at most one per sync event to minimize the potential for dropped connections.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-06-27 15:05:43 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1267746, 1339502 | ||
|
Description
Marek Schmidt
2016-03-22 15:32:30 UTC
This is the default resync interval (10 minutes) where the router's in-memory etcd cache is flushed and re-read from the store. As the router get new endpoints, it reapplies those changes - the current mechanism will force a write and reload for every change we detect. This is the same as the github issue: https://github.com/openshift/origin/issues/7409 Once we write the config in 1 shot on a new resource version available - we would do this once every 10 minutes. There's a couple of workarounds here: 1. You can increase the resync interval by passing an option to the infra router: --resync-interval=10m # 10m == 10 minutes. $ oc edit dc/router -o json $ # and in the editor add the command line args/entrypoint for the router container: "command": ["/usr/bin/openshift-router", "--resync-interval=10m" ], Example: "spec": { "containers": [ { "name": "router", "image": "openshift/origin-haproxy-router:latest", "command": ["/usr/bin/openshift-router", "--resync-interval=10m" ], ... 2. You can additionally control how often the router reloads occur by specifying an environment variable: oc env dc router RELOAD_INTERVAL=10s That coalesces multiple reloads within 10 seconds of each other. You can use one or a combination of the options to alleviate this issue. The second option for reload interval is only available in the latest releases. Thanks for the diagnosis Ram. I'm reassigning to Maru to align it with the issue. You can have the router install an iptables rule to somewhat mitigate this, the instructions are at: https://github.com/openshift/openshift-docs/pull/1987 *** Bug 1333522 has been marked as a duplicate of this bug. *** Checked with latest haproxy router image v3.2.1.1. Issue has been fixed. After add the timestamp to the reload-haproxy script. The haproxy router will not reload periodically. And the connection to the route will not be interrupted during the haproxy-router reload by scaling it up. Here is some ab result: Concurrency Level: 10 Time taken for tests: 1711.198 seconds Complete requests: 109071 Failed requests: 0 Total transferred: 39156489 bytes HTML transferred: 3053988 bytes Requests per second: 63.74 [#/sec] (mean) Time per request: 156.888 [ms] (mean) Time per request: 15.689 [ms] (mean, across all concurrent requests) Transfer rate: 22.35 [Kbytes/sec] received Move the bug to verified. *** Bug 1329399 has been marked as a duplicate of this bug. *** *** Bug 1336009 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1343 |