Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1839989

Summary: Optimize the logic of calling router reload
Product: OpenShift Container Platform Reporter: sfu <sfu>
Component: NetworkingAssignee: aos-network-edge-staff <aos-network-edge-staff>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, bbennett, cholman, jinjli, mmasters, mnunes
Version: 3.11.0Keywords: Reopened
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-28 16:13:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sfu@redhat.com 2020-05-26 07:40:01 UTC
Description of problem:

Because large number of websocket application is deployed in the cluster, long connection will be maintained. After the router reloads several times, there are too many haproxy processes in the router pod, which leads to the exhaustion of the infra node memory.

It is hoped that engineer can optimize the router reload mechanism and minimize the number of haproxy reloads. For example, before reloads, compare whether the haproxy configuration file has changed. If the configuration file has not changed and the last reload has been successful, do not reload unnecessarily. We test and find that when there are events such as new services and scale up applications in the cluster, whether these services are associated with route or selected by router sharding, they will result in the haproxy reload of all router shards in the cluster. In fact, the haproxy configuration files of many shards have not changed at this time.


How reproducible:
always

Actual results:
Unnecessary and frequent reloads result in large number of long connection remained. 
which consume large amount of memory

Expected results:
Optimize the logic of calling router reload, reduce unnecessary reload

Comment 3 Andrew McDermott 2020-06-17 10:51:40 UTC
(In reply to sfu from comment #0)
> Description of problem:
> 
> Because large number of websocket application is deployed in the cluster,
> long connection will be maintained. After the router reloads several times,
> there are too many haproxy processes in the router pod, which leads to the
> exhaustion of the infra node memory.
> 
> It is hoped that engineer can optimize the router reload mechanism and
> minimize the number of haproxy reloads. For example, before reloads, compare
> whether the haproxy configuration file has changed. If the configuration
> file has not changed and the last reload has been successful, do not reload
> unnecessarily. We test and find that when there are events such as new
> services and scale up applications in the cluster, whether these services
> are associated with route or selected by router sharding, they will result
> in the haproxy reload of all router shards in the cluster. In fact, the
> haproxy configuration files of many shards have not changed at this time.
> 
> 
> How reproducible:
> always
> 
> Actual results:
> Unnecessary and frequent reloads result in large number of long connection
> remained. 
> which consume large amount of memory
> 
> Expected results:
> Optimize the logic of calling router reload, reduce unnecessary reload

I did a very brief experiment with a service:

- I added a new annotation and I didn't see a reload for that change.
- I changed the targetPort of an existing service and noticed a changed (expected)
- I deleted my new annotation and I didn't see a reload for that change
- I reverted the targetPort change and saw a reload (expected)

It seems there is already some checking (at least for services) to see
if the router really should reload.

Comment 4 Andrew McDermott 2020-06-17 10:52:09 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 5 sfu@redhat.com 2020-06-18 07:24:48 UTC
Hi Andrew,

many other behavior will cause router reload.
For example, creating a svc that not related any route should not trigger a router reload, because it won't impact haproxy config file.
Comparing whether the haproxy configuration file has changed before router reload could reduce the number of unnecessary reloads.

Comment 6 Andrew McDermott 2020-07-09 12:09:26 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 7 Andrew McDermott 2020-07-30 10:04:27 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 8 Andrew McDermott 2020-09-02 16:33:37 UTC
(In reply to Andrew McDermott from comment #6)
> I’m adding UpcomingSprint, because I was occupied by fixing bugs with
> higher priority/severity, developing new features with higher
> priority, or developing new features to improve stability at a macro
> level. I will revisit this bug next sprint.

As this really is an enhancement to the current design this is now captured in the following RFE: 

  https://issues.redhat.com/browse/NE-391

Comment 9 Andrew McDermott 2020-10-29 17:25:30 UTC
*** Bug 1891810 has been marked as a duplicate of this bug. ***

Comment 11 Miciah Dashiel Butler Masters 2021-09-21 16:28:42 UTC
(In reply to sfu from comment #5)
> many other behavior will cause router reload.
> For example, creating a svc that not related any route should not trigger a
> router reload, because it won't impact haproxy config file.

Can you provide a reproducer for this problem?

Comment 12 Miciah Dashiel Butler Masters 2021-09-28 16:13:22 UTC
Comment 5 doesn't specify whether sharding was enabled.  If this is only an issue when using namespace-based route sharding, then the issue may have been fixed as bug 1905748.  Otherwise, we need more information to be able to work on this issue.  If you can provide a reproducer, please re-open with details.

Comment 13 Red Hat Bugzilla 2023-09-15 01:29:43 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days