1595513 – Router keep restarting duo to large number of routes.

Bug 1595513 - Router keep restarting duo to large number of routes.

Summary: Router keep restarting duo to large number of routes.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.11.0
Assignee:	Ivan Chavero
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-06-27 03:47 UTC by sfu@redhat.com
Modified:	2023-09-15 01:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-08-01 17:43:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description sfu@redhat.com 2018-06-27 03:47:32 UTC

Description of problem:
When router(haproxy) is performing reload operations and if it contains large number of route(about 10000). It won't pass the health check duo to Low performance,which kills router continuously.


Version-Release number of selected component (if applicable):
ocp 3.9.14
router 3.9.14

How reproducible:
always

Steps to Reproduce:
1.create a router
2.scale up to 3 router pods
3.create 10000+ routes

Actual results:
router pod keep restarting

Expected results:
running well

Additional info:
when increase haproxy backend check interval to 300s,the problem can be avoid.

Comment 1 Ben Bennett 2018-06-27 13:29:59 UTC

Can we get details on the vm/machine they are running the router on.

However, it sounds like they have identified a work-around for the time being.  Changes that are going into 3.11 may help with this situation.

Comment 2 sfu@redhat.com 2018-06-28 01:43:48 UTC

The hardware information of this infra node that running router is:8core 32GB

Please feel free to let me know what additional info you need,thanks.

Comment 3 sfu@redhat.com 2018-07-05 08:31:08 UTC

(In reply to Ben Bennett from comment #1)
> Can we get details on the vm/machine they are running the router on.
> 
> However, it sounds like they have identified a work-around for the time
> being.  Changes that are going into 3.11 may help with this situation.

Thanks Bennett for your reply.

The root cause of this issue probably is that haproxy has very low performance in reloading large number of routes, so that a health check can not be completed within the default Readniess and livessness detection cycle, which requires an improvement to fix. At present, temporarily increase number of health checks and the interval of detection can only evade the problem.

Comment 18 Ivan Chavero 2018-08-01 17:43:26 UTC

I'm closing this bug, feel free to reopen it if the problem persists.

Comment 19 Red Hat Bugzilla 2023-09-15 01:27:25 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days

Note You need to log in before you can comment on or make changes to this bug.