Bug 1383663 - [3.4] Router doesn't immediately load existing routes on pod redeployement
Summary: [3.4] Router doesn't immediately load existing routes on pod redeployement
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 3.4.z
Assignee: Maru Newby
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1381584 1387714 1402488 (view as bug list)
Depends On: 1382388
Blocks: 1415276
TreeView+ depends on / blocked
 
Reported: 2016-10-11 11:46 UTC by Jaspreet Kaur
Modified: 2023-10-06 17:34 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The router wouldn't reload HAProxy after the initial sync if the last item of the initial list of any of the watched resources didn't reach the router to trigger the commit. This could be caused by a route being rejected for any reason (e.g. specifying a host claimed by another namespace). Consequence: The router could be left in its initial state (without any routes configured) until another commit-triggering event occurred (e.g. a watch event). Fix: The router always reloads after initial sync. Result: Routes are available after the initial sync.
Clone Of:
: 1415276 (view as bug list)
Environment:
Last Closed: 2017-01-31 20:18:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 11768 0 None None None 2016-11-04 05:37:03 UTC
Origin (Github) 12178 0 None None None 2017-01-04 16:43:12 UTC
Origin (Github) 12199 0 None None None 2016-12-09 03:27:39 UTC
Red Hat Knowledge Base (Solution) 2851251 0 None None None 2017-01-20 00:11:12 UTC
Red Hat Product Errata RHBA-2017:0218 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4.1.2 bug fix update 2017-02-01 01:18:20 UTC

Description Jaspreet Kaur 2016-10-11 11:46:50 UTC
Description of problem: When the HAProxy pods are first created they don't properly load the routes configured to start on them. This happens when router pods are redeployed/deleted.

This results in restarts of the Proxy pods resulting in serving 503 errors until they reload.

[router1.paas.qa.int.phx1.redhat.com] [11:37:00 AM]
[root@router1 ~]# docker logs -f $(docker ps | grep haproxy-router | cut -c1-12)
I1010 07:35:16.929616       1 router.go:153] Router is only using routes in namespaces matching region=phx1,servicephase=qa,zone=internal
I1010 07:35:17.131139       1 router.go:321] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
W1010 07:37:00.928813       1 router.go:617] a edge terminated route with host master-webarchi.int.paas.qa.redhat.com does not have the required certificates.  The route will still be created but no certificates will be written
W1010 07:37:00.928857       1 router.go:617] a edge terminated route with host hello-world-syseng-validation.int.paas.qa.redhat.com does not have the required certificates.  The route will still be created but no certificates will be written
I1010 07:37:00.987037       1 router.go:321] Router reloaded:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results: the routes are not there after redeployment.


Expected results: the routes should exist in the new routers that are created.


Additional info:

Comment 2 Ben Bennett 2016-10-28 16:57:29 UTC
I think this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1387714

Comment 3 Ben Bennett 2016-10-28 17:35:54 UTC
Maru: Can you see if we can do something about not loading the router until we have processed the routes?  My only concern about stalling is that not putting up something in a timely manner may interfere with health checks and then we may get killed and then loop.

Comment 4 Maru Newby 2016-10-28 22:19:47 UTC
@bbennett: The liveness/readiness check for a template router targets the haproxy stats port.  This precludes starting the router pod without starting haproxy, and it's not possible to change the liveness/readiness probes for a running pod.  However, it should be possible to configure haproxy to avoid binding ports for http/tls traffic when it initially starts.  Binding could be delayed until the route state had been read.

Comment 5 Ben Bennett 2016-11-07 15:26:56 UTC
*** Bug 1387714 has been marked as a duplicate of this bug. ***

Comment 6 Vladislav Walek 2016-11-29 15:20:35 UTC
Hi,
have a similar case but customer is saying that after the second reload.

Thanks

Comment 8 Maru Newby 2016-12-09 03:27:40 UTC
I think I found the cause of this issue, separate from the port binding issue.  The github PR has a fix.

Comment 23 Hongan Li 2017-01-22 05:26:04 UTC
verified in 3.4.1.0 and the issue has been fixed. 

# oc logs router-2-badgl
I0122 05:18:00.655645       1 router.go:456] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0122 05:18:00.655951       1 router.go:221] Router is only using routes in namespaces matching team=red
E0122 05:18:00.697077       1 controller.go:169] a route in another namespace holds test-edge.example.com and is older than secured-edge-route
I0122 05:18:00.747442       1 router.go:456] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).

Comment 24 Ben Bennett 2017-01-27 16:24:18 UTC
*** Bug 1381584 has been marked as a duplicate of this bug. ***

Comment 25 Ben Bennett 2017-01-27 16:25:03 UTC
*** Bug 1402488 has been marked as a duplicate of this bug. ***

Comment 27 errata-xmlrpc 2017-01-31 20:18:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0218


Note You need to log in before you can comment on or make changes to this bug.