Bug 1839989
| Summary: | Optimize the logic of calling router reload | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | sfu <sfu> |
| Component: | Networking | Assignee: | aos-network-edge-staff <aos-network-edge-staff> |
| Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
| Status: | CLOSED INSUFFICIENT_DATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aos-bugs, bbennett, cholman, jinjli, mmasters, mnunes |
| Version: | 3.11.0 | Keywords: | Reopened |
| Target Milestone: | --- | ||
| Target Release: | 3.11.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-28 16:13:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
sfu@redhat.com
2020-05-26 07:40:01 UTC
(In reply to sfu from comment #0) > Description of problem: > > Because large number of websocket application is deployed in the cluster, > long connection will be maintained. After the router reloads several times, > there are too many haproxy processes in the router pod, which leads to the > exhaustion of the infra node memory. > > It is hoped that engineer can optimize the router reload mechanism and > minimize the number of haproxy reloads. For example, before reloads, compare > whether the haproxy configuration file has changed. If the configuration > file has not changed and the last reload has been successful, do not reload > unnecessarily. We test and find that when there are events such as new > services and scale up applications in the cluster, whether these services > are associated with route or selected by router sharding, they will result > in the haproxy reload of all router shards in the cluster. In fact, the > haproxy configuration files of many shards have not changed at this time. > > > How reproducible: > always > > Actual results: > Unnecessary and frequent reloads result in large number of long connection > remained. > which consume large amount of memory > > Expected results: > Optimize the logic of calling router reload, reduce unnecessary reload I did a very brief experiment with a service: - I added a new annotation and I didn't see a reload for that change. - I changed the targetPort of an existing service and noticed a changed (expected) - I deleted my new annotation and I didn't see a reload for that change - I reverted the targetPort change and saw a reload (expected) It seems there is already some checking (at least for services) to see if the router really should reload. I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. Hi Andrew, many other behavior will cause router reload. For example, creating a svc that not related any route should not trigger a router reload, because it won't impact haproxy config file. Comparing whether the haproxy configuration file has changed before router reload could reduce the number of unnecessary reloads. I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. (In reply to Andrew McDermott from comment #6) > I’m adding UpcomingSprint, because I was occupied by fixing bugs with > higher priority/severity, developing new features with higher > priority, or developing new features to improve stability at a macro > level. I will revisit this bug next sprint. As this really is an enhancement to the current design this is now captured in the following RFE: https://issues.redhat.com/browse/NE-391 *** Bug 1891810 has been marked as a duplicate of this bug. *** (In reply to sfu from comment #5) > many other behavior will cause router reload. > For example, creating a svc that not related any route should not trigger a > router reload, because it won't impact haproxy config file. Can you provide a reproducer for this problem? Comment 5 doesn't specify whether sharding was enabled. If this is only an issue when using namespace-based route sharding, then the issue may have been fixed as bug 1905748. Otherwise, we need more information to be able to work on this issue. If you can provide a reproducer, please re-open with details. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |