Bug 1595513

Summary:	Router keep restarting duo to large number of routes.
Product:	OpenShift Container Platform	Reporter:	sfu <sfu>
Component:	Networking	Assignee:	Ivan Chavero <ichavero>
Networking sub component:	router	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED CURRENTRELEASE	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	aos-bugs, bbennett, erich, rpuccini, sfu
Version:	3.9.0
Target Milestone:	---
Target Release:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-08-01 17:43:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description sfu@redhat.com 2018-06-27 03:47:32 UTC

Description of problem:
When router(haproxy) is performing reload operations and if it contains large number of route(about 10000). It won't pass the health check duo to Low performance,which kills router continuously.


Version-Release number of selected component (if applicable):
ocp 3.9.14
router 3.9.14

How reproducible:
always

Steps to Reproduce:
1.create a router
2.scale up to 3 router pods
3.create 10000+ routes

Actual results:
router pod keep restarting

Expected results:
running well

Additional info:
when increase haproxy backend check interval to 300s,the problem can be avoid.

Comment 1 Ben Bennett 2018-06-27 13:29:59 UTC

Can we get details on the vm/machine they are running the router on.

However, it sounds like they have identified a work-around for the time being.  Changes that are going into 3.11 may help with this situation.

Comment 2 sfu@redhat.com 2018-06-28 01:43:48 UTC

The hardware information of this infra node that running router is:8core 32GB

Please feel free to let me know what additional info you need,thanks.

Comment 3 sfu@redhat.com 2018-07-05 08:31:08 UTC

(In reply to Ben Bennett from comment #1)
> Can we get details on the vm/machine they are running the router on.
> 
> However, it sounds like they have identified a work-around for the time
> being.  Changes that are going into 3.11 may help with this situation.

Thanks Bennett for your reply.

The root cause of this issue probably is that haproxy has very low performance in reloading large number of routes, so that a health check can not be completed within the default Readniess and livessness detection cycle, which requires an improvement to fix. At present, temporarily increase number of health checks and the interval of detection can only evade the problem.

Comment 18 Ivan Chavero 2018-08-01 17:43:26 UTC

I'm closing this bug, feel free to reopen it if the problem persists.

Comment 19 Red Hat Bugzilla 2023-09-15 01:27:25 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days