This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1383663 - [3.4] Router doesn't immediately load existing routes on pod redeployement
[3.4] Router doesn't immediately load existing routes on pod redeployement
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing (Show other bugs)
3.2.0
Unspecified Unspecified
high Severity medium
: ---
: 3.4.z
Assigned To: Maru Newby
zhaozhanqi
: Performance
: 1381584 1387714 1402488 (view as bug list)
Depends On: 1382388
Blocks: 1415276
  Show dependency treegraph
 
Reported: 2016-10-11 07:46 EDT by Jaspreet Kaur
Modified: 2017-03-28 01:37 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The router wouldn't reload HAProxy after the initial sync if the last item of the initial list of any of the watched resources didn't reach the router to trigger the commit. This could be caused by a route being rejected for any reason (e.g. specifying a host claimed by another namespace). Consequence: The router could be left in its initial state (without any routes configured) until another commit-triggering event occurred (e.g. a watch event). Fix: The router always reloads after initial sync. Result: Routes are available after the initial sync.
Story Points: ---
Clone Of:
: 1415276 (view as bug list)
Environment:
Last Closed: 2017-01-31 15:18:47 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2851251 None None None 2017-01-19 19:11 EST
Origin (Github) 11768 None None None 2016-11-04 01:37 EDT
Origin (Github) 12178 None None None 2017-01-04 11:43 EST
Origin (Github) 12199 None None None 2016-12-08 22:27 EST

  None (edit)
Description Jaspreet Kaur 2016-10-11 07:46:50 EDT
Description of problem: When the HAProxy pods are first created they don't properly load the routes configured to start on them. This happens when router pods are redeployed/deleted.

This results in restarts of the Proxy pods resulting in serving 503 errors until they reload.

[router1.paas.qa.int.phx1.redhat.com] [11:37:00 AM]
[root@router1 ~]# docker logs -f $(docker ps | grep haproxy-router | cut -c1-12)
I1010 07:35:16.929616       1 router.go:153] Router is only using routes in namespaces matching region=phx1,servicephase=qa,zone=internal
I1010 07:35:17.131139       1 router.go:321] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
W1010 07:37:00.928813       1 router.go:617] a edge terminated route with host master-webarchi.int.paas.qa.redhat.com does not have the required certificates.  The route will still be created but no certificates will be written
W1010 07:37:00.928857       1 router.go:617] a edge terminated route with host hello-world-syseng-validation.int.paas.qa.redhat.com does not have the required certificates.  The route will still be created but no certificates will be written
I1010 07:37:00.987037       1 router.go:321] Router reloaded:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results: the routes are not there after redeployment.


Expected results: the routes should exist in the new routers that are created.


Additional info:
Comment 2 Ben Bennett 2016-10-28 12:57:29 EDT
I think this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1387714
Comment 3 Ben Bennett 2016-10-28 13:35:54 EDT
Maru: Can you see if we can do something about not loading the router until we have processed the routes?  My only concern about stalling is that not putting up something in a timely manner may interfere with health checks and then we may get killed and then loop.
Comment 4 Maru Newby 2016-10-28 18:19:47 EDT
@bbennett: The liveness/readiness check for a template router targets the haproxy stats port.  This precludes starting the router pod without starting haproxy, and it's not possible to change the liveness/readiness probes for a running pod.  However, it should be possible to configure haproxy to avoid binding ports for http/tls traffic when it initially starts.  Binding could be delayed until the route state had been read.
Comment 5 Ben Bennett 2016-11-07 10:26:56 EST
*** Bug 1387714 has been marked as a duplicate of this bug. ***
Comment 6 Vladislav Walek 2016-11-29 10:20:35 EST
Hi,
have a similar case but customer is saying that after the second reload.

Thanks
Comment 8 Maru Newby 2016-12-08 22:27:40 EST
I think I found the cause of this issue, separate from the port binding issue.  The github PR has a fix.
Comment 23 hongli 2017-01-22 00:26:04 EST
verified in 3.4.1.0 and the issue has been fixed. 

# oc logs router-2-badgl
I0122 05:18:00.655645       1 router.go:456] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0122 05:18:00.655951       1 router.go:221] Router is only using routes in namespaces matching team=red
E0122 05:18:00.697077       1 controller.go:169] a route in another namespace holds test-edge.example.com and is older than secured-edge-route
I0122 05:18:00.747442       1 router.go:456] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
Comment 24 Ben Bennett 2017-01-27 11:24:18 EST
*** Bug 1381584 has been marked as a duplicate of this bug. ***
Comment 25 Ben Bennett 2017-01-27 11:25:03 EST
*** Bug 1402488 has been marked as a duplicate of this bug. ***
Comment 27 errata-xmlrpc 2017-01-31 15:18:47 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0218

Note You need to log in before you can comment on or make changes to this bug.