Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1595513 - Router keep restarting duo to large number of routes. [NEEDINFO]
Router keep restarting duo to large number of routes.
Status: CLOSED CURRENTRELEASE
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing (Show other bugs)
3.9.0
Unspecified Unspecified
unspecified Severity high
: ---
: 3.11.0
Assigned To: Ivan Chavero
zhaozhanqi
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-06-26 23:47 EDT by sfu@redhat.com
Modified: 2018-08-01 13:43 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-08-01 13:43:26 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ichavero: needinfo? (sfu)


Attachments (Terms of Use)

  None (edit)
Description sfu@redhat.com 2018-06-26 23:47:32 EDT
Description of problem:
When router(haproxy) is performing reload operations and if it contains large number of route(about 10000). It won't pass the health check duo to Low performance,which kills router continuously.


Version-Release number of selected component (if applicable):
ocp 3.9.14
router 3.9.14

How reproducible:
always

Steps to Reproduce:
1.create a router
2.scale up to 3 router pods
3.create 10000+ routes

Actual results:
router pod keep restarting

Expected results:
running well

Additional info:
when increase haproxy backend check interval to 300s,the problem can be avoid.
Comment 1 Ben Bennett 2018-06-27 09:29:59 EDT
Can we get details on the vm/machine they are running the router on.

However, it sounds like they have identified a work-around for the time being.  Changes that are going into 3.11 may help with this situation.
Comment 2 sfu@redhat.com 2018-06-27 21:43:48 EDT
The hardware information of this infra node that running router is:8core 32GB

Please feel free to let me know what additional info you need,thanks.
Comment 3 sfu@redhat.com 2018-07-05 04:31:08 EDT
(In reply to Ben Bennett from comment #1)
> Can we get details on the vm/machine they are running the router on.
> 
> However, it sounds like they have identified a work-around for the time
> being.  Changes that are going into 3.11 may help with this situation.

Thanks Bennett for your reply.

The root cause of this issue probably is that haproxy has very low performance in reloading large number of routes, so that a health check can not be completed within the default Readniess and livessness detection cycle, which requires an improvement to fix. At present, temporarily increase number of health checks and the interval of detection can only evade the problem.
Comment 18 Ivan Chavero 2018-08-01 13:43:26 EDT
I'm closing this bug, feel free to reopen it if the problem persists.

Note You need to log in before you can comment on or make changes to this bug.